There is a lot of unnecessary duplication of content of the seccomp
material in prctl(2) and seccomp(2). Trevor Woerner also noted that
there is an error in prctl(2), where it says that the filters
"are run in order until the first non-allow result is seen", which
contradicts the correct statement in seccomp(2) that *all* filters
are executed.
So, rewrite the seccomp material in prctl(2) to strip out most of
the content duplicated in seccomp(2), and replace the removed
text with statements deferring to to seccomp(2).
Reported-by: Trevor Woerner <twoerner@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some subsection (.SS) headings and paragraph breaks in
DESCRIPTION, to make the page more easily readable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc has tightened up its rules for replacing the memory
allocator. I went through the malloc man page and looked for how
it documented malloc() and related functions, and fixed
discrepancies with glibc malloc() documentation and/or
implementation. I also reorganized the portability discussion so
that portability issues can be seen more clearly.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Documentation/filesystems/sharedsubtree.txt has changed to
Documentation/filesystems/sharedsubtree.rst.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Especially the change to .rst format in the kernel Documentation/
tree has rendered many of the references in this manual page
obsolete. Fix them.
Reported-by: Vito Caputo <vcaputo@pengaru.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This makes it easier to compare this page to the standard,
to get more details about the rules between operators.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Unary operators are mentioned in C11::6.5.3, and casts are in
C11::6.5.4 (they are mentioned in order of precedence).
And from note 85 (in section 6.5) in that same C11 standard, major
subsections 6.5.X are sorted by precedence.
As an example (from Jakub), `sizeof(int)+1` is interpreted as
`(sizeof(int))+1`, and not `sizeof((int)+1)`.
I used C11 and not C18 (the latest) because at least in the draft
copy of C18 that I have, there are a few important typos in that
section, while the draft copy of C11 that I have is free of those
typos. And C11 and C18 are almost identical, with no major
changes to the language.
Reported-by: David Sletten <david.paul.sletten@gmail.com>
Cc: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Zero in this case refers to literal constant 0 and not symbolic
constant B0.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted by Jakub:
BTW, the exit_group.2 man page could use an update (possibly
by merging it into exit.2): it says that the "system
call is is equivalent to _exit(2) except that it terminates
not only the calling thread, but all threads in the calling
process's thread group", which isn't helpful these days.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous wording wasn't very explicit, leaving room for
believing that 'errno' may be 0 after returning EAI_SYSTEM.
Use a wording similar to other pages, for added consistency.
[mtk: edited commit message title; also, POSIX notes that
'errno' is set in this case.]
Reported-by: Cristian Morales Vega <christian.morales.vega@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Many times, this page use the terminology "mount point", where
"mount" would be better. A "mount point" is the location at which
a mount is attached. A "mount" is an association between a
filesystem and a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The correct terminology is "less privileged mount namespace"
(not "less privileged user namespace").
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The "Restrictions on mount namespaces" subsection belongs lower in
the page, following the discussion of concepts (e.g., shared
subtrees and propagation) that are discussed elsewhere in the page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous commit injected a large block of text into a list,
separating one example in the previous list item from a
"continuation" in the following list item. repair that.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For a long time, this manual page has had a brief discussion of
"locked" mounts, without clearly saying what this concept is, or
why it exists. Expand the discussion with an explanation of what
locked mounts are, why mounts are locked, and some examples of the
effect of locking.
Thanks to Christian Brauner for a lot of help in understanding
these details.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These pages split out extra errors for some APIs into a separate
list. Probably, the pages are easier to ready if all errors are
combined into a single list.
Note that there still remain a few pages where the errors are
listed separately for different APIs. For the moment, it seems
best to leave those pages as is, since the error lists are
largely distinct in those pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These split out errors into separate lists (perhaps per API,
perhaps "may" vs "shall", perhaps "Linux-specific" vs
standard(??)), but there's no good reason to do this. It makes
the error list harder to read, and is inconsistent with other
pages. So, combine the errors into a single list.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mainly in preparation for the following patch on project IDs maps.
Add some words that will make the parallels between the rules for
updating uid_map and projid_map clearer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Many times, these pages use the terminology "mount point", where
"mount" would be better. A "mount point" is the location at which
a mount is attached. A "mount" is an association between a
filesystem and a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 5.8 adds STATX_MNT_ID and stx_mnt_id.
Add description to statx.2
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email with Christian Brauner:
>>>>>> int fd_tree = open_tree(-EBADF, source,
>>>>>> OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC |
>>>>>> AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0));
>>>>>
>>>>> ???
>>>>> What is the significance of -EBADF here? As far as I can tell, it
>>>>> is not meaningful to open_tree()?
>>>>
>>>> I always pass -EBADF for similar reasons to [2]. Feel free to just use -1.
>>>
>>> ????
>>> But here, both -EBADF and -1 seem to be wrong. This argument
>>> is a dirfd, and so should either be a file descriptor or the
>>> value AT_FDCWD, right?
>>
>> [1]: In this code "source" is expected to be absolute. If it's not
>> absolute we should fail. This can be achieved by passing -1/-EBADF,
>> afaict.
>
> D'oh! Okay. I hadn't considered that use case for an invalid dirfd.
> (And now I've done some adjustments to openat(2),which contains a
> rationale for the *at() functions.)
>
> So, now I understand your purpose, but still the code is obscure,
> since
>
> * You use a magic value (-EBADF) rather than (say) -1.
> * There's no explanation (comment about) of the fact that you want
> to prevent relative pathnames.
>
> So, I've changed the code to use -1, not -EBADF, and I've added some
> comments to explain that the intent is to prevent relative pathnames.
> Okay?
Sounds good.
>
> But, there is still the meta question: what's the problem with using
> a relative pathname?
Nothing per se. Ok, you asked so it's your fault:
When writing programs I like to never use relative paths with AT_FDCWD
because. Because making assumptions about the current working directory
of the calling process is just too easy to get wrong; especially when
pivot_root() or chroot() are in play.
My absolut preference (joke intended) is to open a well-known starting
point with an absolute path to get a dirfd and then scope all future
operations beneath that dirfd. This already works with old-style
openat() and _very_ cautious programming but openat2() and its
resolve-flag space have made this **chef's kiss**.
If I can't operate based on a well-known dirfd I use absolute paths with
a -EBADF dirfd passed to *at() functions.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email:
>> Thanks. I made it "detached". Elsewhere, the page already explains
>> that a detached mount is one that:
>>
>> must have been created by calling open_tree(2) with the
>> OPEN_TREE_CLONE flag and it must not already have been
>> visible in the filesystem.
>>
>> Which seems a fine explanation.
>>
>> ????
>> But, just a thought... "visible in the filesystem" seems not quite accurate.
>> What you really mean I guess is that it must not already have been
>> /visible in the filesystem hierarchy/previously mounted/something else/,
>> right?
I suppose that I should have clarified that my main problem was
that you were using the word "filesystem" in a way that I find
unconventional/ambiguous. I mean, I normally take the term
"filesystem" to be "a storage system for folding files".
Here, you are using "filesystem" to mean something else, what
I might call like "the single directory hierarchy" or "the
filesystem hierarchy" or "the list of mount points".
> A detached mount is created via the OPEN_TREE_CLONE flag. It is a
> separate new mount so "previously mounted" is not applicable.
> A detached mount is _related_ to what the MS_BIND flag gives you with
> mount(2). However, they differ conceptually and technically. A MS_BIND
> mount(2) is always visible in the fileystem when mount(2) returns, i.e.
> it is discoverable by regular path-lookup starting within the
> filesystem.
>
> However, a detached mount can be seen as a split of MS_BIND into two
> distinct steps:
> 1. fd_tree = open_tree(OPEN_TREE_CLONE): create a new mount
> 2. move_mount(fd_tree, <somewhere>): attach the mount to the filesystem
>
> 1. and 2. together give you the equivalent of MS_BIND.
> In between 1. and 2. however the mount is detached. For the kernel
> "detached" means that an anonymous mount namespace is attached to it
> which doen't appear in proc and has a 0 sequence number (Technically,
> there's a bit of semantical argument to be made that "attached" and
> "detached" are ambiguous as they could also be taken to mean "does or
> does not have a parent mount". This ambiguity e.g. appears in
> do_move_mount(). That's why the kernel itself calls it an "anonymous
> mount". However, an OPEN_TREE_CLONE-detached mount of course doesn't
> have a parent mount so it works.).
>
> For userspace it's better to think of detached and attached in terms of
> visibility in the filesystem or in a mount namespace. That's more
> straightfoward, more relevant, and hits the target in 90% of the cases.
>
> However, the better and clearer picture is to say that a
> OPEN_TREE_CLONE-detached mount is a mount that has never been
> move_mount()ed. Which in turn can be defined as the detached mount has
> never been made visible in a mount namespace. Once that has happened the
> mount is irreversibly an attached mount.
>
> I keep thinking that maybe we should just say "anonymous mount"
> everywhere. So changing the wording to:
I'm not against the word "detached". To user space, I think it is a
little more meaningful than "anonymous". For the moment, I'll stay with
"detached", but if you insist on "anonymous", I'll probably change it.
> [...]
> EINVAL The mount that is to be ID mapped is not an anonymous mount;
> that is, the mount has already been visible in a mount namespace.
I like that text *a lot* better! Thanks very much for suggesting
wordings. It makes my life much easier.
I've made the text:
EINVAL The mount that is to be ID mapped is not a detached
mount; that is, the mount has not previously been
visible in a mount namespace.
> [...]
> The mount must be an anonymous mount; that is, it must have been
> created by calling open_tree(2) with the OPEN_TREE_CLONE flag and it
> must not already have been visible in a mount namespace, i.e. it must
> not have been attached to the filesystem hierarchy with syscalls such
> as move_mount() syscall.
And that too! I've made the text:
• The mount must be a detached mount; that is, it must have
been created by calling open_tree(2) with the
OPEN_TREE_CLONE flag and it must not already have been
visible in a mount namespace. (To put things another way:
the mount must not have been attached to the filesystem
hierarchy with a system call such as move_mount(2).)
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email with Christian Braner:
> [1]: In this code "source" is expected to be absolute. If it's not
> absolute we should fail. This can be achieved by passing -1/-EBADF,
> afaict.
D'oh! Okay. I hadn't considered that use case for an invalid dirfd.
(And now I've done some adjustments to openat(2),which contains a
rationale for the *at() functions.)
So, now I understand your purpose, but still the code is obscure,
since
* You use a magic value (-EBADF) rather than (say) -1.
* There's no explanation (comment about) of the fact that you want
to prevent relative pathnames.
So, I've changed the code to use -1, not -EBADF, and I've added some
comments to explain that the intent is to prevent relative pathnames.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the description of the EBADF error for invalid 'dirfd' more
uniform. In particular, note that the error only occurs when the
pathname is relative, and that it occurs when the 'dirfd' is
neither valid *nor* has the value AT_FDCWD.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, specifying an invalid file descriptor number
in 'dirfd' can be used as a check that 'pathname' is absolute.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Kir Kolyshkin made a start, but I think much more needs to
be said...
Reviewed-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From an email conversation with Alexis:
Hello Alexis,
On 8/6/21 7:06 PM, Alexis Wilke wrote:
> Hi guys,
>
> The pthread_setname_np(3) manual page has an example where the second
> argument is used to get a size of the thread name.
>
> https://man7.org/linux/man-pages/man3/pthread_setname_np.3.html#EXAMPLES
>
> The current code:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? atoi(argv[1]) : NAMELEN);
>
> The suggested code:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? atoi(argv[2]) : NAMELEN);
I agree that there's a problem, but I think we could go even simpler:
rc = pthread_getname_np(thread, thread_name, NAMELEN);
> I'm thinking that maybe the author meant to compute the length like so:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? strlen(argv[1]) + 1 :
> NAMELEN);
>
> But I think that the atoi() points to using argv[2] as a number
> representing the length.
>
> (Of course, it should be tested against NAMELEN as a maximum, but I
> understand that examples do not always show how to verify each possible
> error).
I imagine that the author's intention was to allow the user to do
experiments where argv[2] specified a number less than NAMELEN,
in order to see the resulting ERANGE error. But, that experiment
is of limited value, and complicates the code unnecessarily, IMO,
so that's why I made the change above.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Reported-by: Alexis Wilke <alexis@m2osw.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Phrases such as "In the new mount API" will date fast. Remove it.
Also:
* Make it clear that MOUNT_ATTR__ATIME expresses a bit field.
* Replace 'enum' with 'enumeration'.
* Clarify what is meant by "partially" set MOUNT_ATTR__ATIME.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These types are already well described in mount_namespaces(7);
indeed, much of the text from that page seems to have just been
cut and pasted into this page! Simply referring the reader to
mount_namespaces(7) is sufficient.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Point out that this field can have the value zero, meaning
no change. And avoid discussions of 'enum', and simply say
that otherwise the field has one of the MS_* values.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Having this discussion under DESCRIPTION clutters that section,
and has the effect of burying the discussion of propagation. Move
the discussion to NOTES, to make the page more readable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Change some instances of "-" to "\"
- Use C99 style (declare variables nearer use in code)
- Add a bit of white space
- Remove one 'const...const' added by Alex that caused
compiler warnings
- Use "reverse Christmas tree" form for declarations in main()
- Other minor changes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We don't really need ext4(5) and xfs(5) here. They provide
no further info that is directly relevant to the reader of
mount_setattr(2).
clone3(2) isn't necessary because it is the same page as clone(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Fix SYNOPSIS to fit in 78 columns
Also, we don't show when an include is included for a specific type,
unless that header is included _only_ for the type,
or there might be confusion (e.g., termios).
Instead, that type should be documented in system_data_types(7),
with a link page mount_attr-struct(3).
- Fix references to mount_setattr(). See man-pages(7):
Any reference to the subject of the current manual page should be writ‐
ten with the name in bold followed by a pair of parentheses in Roman
(normal) font. For example, in the fcntl(2) man page, references to
the subject of the page would be written as: fcntl(). The preferred
way to write this in the source file is:
.BR fcntl ()
- Fix line breaks according to semantic newline rules (and add some commas)
- Fix wrong usage of .IR when .RI should have been used
- Fix formatting of variable part in FOO<number>:
- Make italic the variable part (as groff_man(7) recommends)
- Remove <>
- Use syntax recommended by G. Branden Robinson (groff)
- Fix unnecessary uses of .BR or .IR when .B or .I would suffice
- Fix formatting of punctuation
In some cases, it was in italics or bold, and it should always be in roman.
- Use uppercase to begin text, even in bullet points, since those were
multi-sentence.
- Simplify usage of .RS/.RE in combination with .IP
- s/fat/FAT/ as fs(7) does
- Slightly reword some sentences for consistency
- Use Linux-specific for consistency with other pages (in VERSIONS)
- EXAMPLES: Place the return type in a line of its own (as in other pages)
- Fix alignment of code
- Replace unnecessary use of the GNU extension ({}) by do {} while (0)
In that case, there was no return value (moreover, it's a noreturn).
- Break complex declaration lines into a line for each variable
The variables were being initialized, some to non-zero values,
so for clarity, a line for each one seems more appropriate.
- Add const to pointers when possible
- s/\\/\e/
- Remove unmatched groff commands
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note the use of FUTEX_CLOCK_REALTIME for selecting the clock,
and eliminate repetition of details already covered in the
description of FUTEX_LOCK_PI.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
FUTEX_LOCK_PI2 is a new futex operation which was recently introduced into the
Linux kernel. It works exactly like FUTEX_LOCK_PI. However, it has support for
selectable clocks for timeouts. By default CLOCK_MONOTONIC is used. If
FUTEX_CLOCK_REALTIME is specified then the timeout is measured against
CLOCK_REALTIME.
This new operation addresses an inconsistency in the futex interface:
FUTEX_LOCK_PI only works with timeouts based on CLOCK_REALTIME in contrast to
all the other PI operations.
Document the FUTEX_LOCK_PI2 command.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Move example program to a new EXAMPLES section
- Invert logic in the handler to have the failure in the
conditional path, and the success out of any conditionals.
- Use NULL, EXIT_SUCCESS, and EXIT_FAILURE instead of magic numbers
- Separate declarations from code
- Put function return type on its own line
- Put function opening brace on its line
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=212385
some/path/dir/ is not always the same as some/path/dir/:
$ mkdir u
$ rmdir u/.
rmdir: failed to remove 'u/.': Invalid argument
$ rmdir u
$
The text in POSIX.1-2018 Section 4.13 ("Pathname Resolution")
is helpful in pointing to a better wording.
Reported-by: Askar Safin <safinaskar@mail.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Emanuele Torre via linux-man@:
[
I was reading the man page for ldd(1)[1]; and I read this in the first
paragraph of the DECRIPTION section:
ldd prints the shared objects (shared libraries) required by each
program or shared object specified on the command line. An
example of its use and output (using sed(1) to trim leading white
space for readability in this page) is the following:
$ ldd /bin/ls | sed 's/^ */ /'
linux-vdso.so.1 (0x00007ffcc3563000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f87e5459000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007f87e5254000)
libc.so.6 => /lib64/libc.so.6 (0x00007f87e4e92000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f87e4c22000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f87e4a1e000)
/lib64/ld-linux-x86-64.so.2 (0x00005574bf12e000)
libattr.so.1 => /lib64/libattr.so.1 (0x00007f87e4817000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f87e45fa000)
This is a little confusing though since that sed(1) command does not
seem to work. (and also potentially misleading for someone who is trying
figure out how to parse ldd(1)'s output.)
ldd(1) prepends a TAB character (0x09) to each line, not spaces:
$ ldd /bin/ls | xxd | head -1
00000000: 096c 696e 7578 2d76 6473 6f2e 736f 2e31 .linux-vdso.so.1
I read ldd(1)'s source code[2] (it is part of glibc) and it seems to be
a bash script that tries to use different rtld programs ( ld.so(8) )
from an RTLDLIST.
Those, on my system, are:
* /usr/lib/ld-linux.so.2
* /usr/lib64/ld-linux-x86-64.so.2
* /usr/libx32/ld-linux-x32.so.2
And they all seem to also be part of glibc.
I have tried to follow the git history of glibc to see when the switch
from spaces to the TAB character occured, but, to me, it seems like
glibc.git/elf/rtld.c has always used '\t'; at since
6a76c115150318eae5d02eca76f2fc03be7bd029[3] (358th commit since glibc
started using the git repository repository - Nov 18th 1995): before
that commit there are not any results for `git grep '\\t'` in the elf
directory and I did not investigate further.
Still, at the time of that commit, glibc did not seem to have an ldd(1)
utility.
Perhaps the man page is old and its original author was using and
documenting an ldd(1) utility that was not part of glibc when he was
writing it.
Anyhow, since I think that sed(1) command will not work on any system
that uses, at least, the most recent version of glibc (because lld(1)
and the ld.so(8) programs it depends on are all part of glibc), I think
that that example should be changed to avoid confusions.
The output format of ldd(1) does not seem to be clearly defined, so I
think this would be a good option:
$ ldd /bin/ls | sed 's/^[[:space:]]*/ /'
NB: ^\s* should also work on most GNU/Linux systems, but \s is
non-standard or documented so I don not suggest using it in the man
page.
Another option could be to remove "the pipe to sed(1)" part and the note
in parentheses that explains why it was used by the original author.
Cheers.
emanuele6
[1]: https://man7.org/linux/man-pages/man1/ldd.1.html
[2]: https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/ldd.bash.in;h=ba736464ac5e4a9390b1b6a39595035238250232;hb=5188a9d0265cc6f7235a8af1d31ab02e4a24853d
[3]: https://sourceware.org/git/?p=glibc.git;a=commit;h=6a76c115150318eae5d02eca76f2fc03be7bd029
///////
$ uname -a
Linux t420 5.10.54-1-lts #1 SMP Wed, 28 Jul 2021 15:05:20 +0000
x86_64 GNU/Linux
$ pacman -Qo ldd
/usr/bin/ldd is owned by glibc 2.33-5
$ pacman -Qo /usr/share/man/man1/ldd.1.gz
/usr/share/man/man1/ldd.1.gz is owned by man-pages 5.12-2
$ pacman -Qo /usr/lib/ld-linux.so.2
/usr/lib/ld-linux.so.2 is owned by lib32-glibc 2.33-5
$ pacman -Qo /usr/lib64/ld-linux-x86-64.so.2
/usr/lib/ld-linux-x86-64.so.2 is owned by glibc 2.33-5
$ pacman -F /usr/libx32/ld-linux-x32.so.2 || echo not available on arch linux.
not available on arch linux.
]
Reported-by: EmanueleTorre <torreemanuele6@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James O. D. Hunt <jamesodhunt@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain that `optstring` cannot contain a semi-colon (`;`)
character.
[mtk: verified with a small test program; see also posix/getopt.c
in the glibc sources:
if (temp == NULL || c == ':' || c == ';')
{
if (print_errors)
fprintf (stderr, _("%s: invalid option -- '%c'\n"), argv[0], c);
d->optopt = c;
return '?';
}
]
Also explain that `optstring` can include `+` as an option
character, possibly in addition to that character being used as
the first character in `optstring` to denote `POSIXLY_CORRECT`
behaviour.
[mtk: verified with a small test program.]
Test program below. Example runs:
$ ./a.out -+
opt = 43 (+); optind = 2
Got plus
$ ./a.out -';'
./a.out: invalid option -- ';'
opt = 63 (?); optind = 2; optopt = 59 (;)
Unrecognized option (-;)
Usage: ./a.out [-p arg] [-x]
Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
#include <ctype.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define printable(ch) (isprint((unsigned char) ch) ? ch : '#')
static void /* Print "usage" message and exit */
usageError(char *progName, char *msg, int opt)
{
if (msg != NULL && opt != 0)
fprintf(stderr, "%s (-%c)\n", msg, printable(opt));
fprintf(stderr, "Usage: %s [-p arg] [-x]\n", progName);
exit(EXIT_FAILURE);
}
int
main(int argc, char *argv[])
{
int opt, xfnd;
char *pstr;
xfnd = 0;
pstr = NULL;
while ((opt = getopt(argc, argv, "p:x+;")) != -1) {
printf("opt =%3d (%c); optind = %d", opt, printable(opt), optind);
if (opt == '?' || opt == ':')
printf("; optopt =%3d (%c)", optopt, printable(optopt));
printf("\n");
switch (opt) {
case 'p': pstr = optarg; break;
case 'x': xfnd++; break;
case ';': printf("Got semicolon\n"); break;
case '+': printf("Got plus\n"); break;
case ':': usageError(argv[0], "Missing argument", optopt);
case '?': usageError(argv[0], "Unrecognized option", optopt);
default:
printf("Unexpected case in switch()\n");
exit(EXIT_FAILURE);
}
}
if (xfnd != 0)
printf("-x was specified (count=%d)\n", xfnd);
if (pstr != NULL)
printf("-p was specified with the value \"%s\"\n", pstr);
if (optind < argc)
printf("First nonoption argument is \"%s\" at argv[%d]\n",
argv[optind], optind);
exit(EXIT_SUCCESS);
}
Do not include unused (and incompatible) header file termios.h and
include required header files for puts() and close() functions.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SPARC is special, it does not have Bnnn constants for baud rates above
2000000. Instead it defines 4 Bnnn constants with smaller baud rates.
This difference between SPARC and non-SPARC architectures is present in
both glibc API (termios.h) and also kernel ioctl API (asm/termbits.h).
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Format variable parts of words referring to a group of identifiers
in italics, following groff_man(7) recommendations.
Also srcfix surrounding uses of \f escape sequences to use macros
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These baud-rate macro constants are defined in bits/termios.h and are
already supported.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Saw this while preparing the "switch to \~" change Alex invited.
Signed-off-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After a patch proposal from наб triggered by concerns that, when
talking about PIPE_BUF, pipe(7) explicitly mentions write(2) but
not writev(2), I've concluded that the reference in writev(2) to
pipe(7) is not needed (mea culpa; I added that text), and I think
the text in pipe(7) could be written to be closer to the POSIX
spec, which doesn't talk about "write() calls", but simply about
"writes".
Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mainly: I generally don't want us to be including URLs to mailing
list discussions in a manual page. Either, the issue in the
discussion is worth writing up in the manual page (so that
the reader doesn't have to look elsewhere), or the details
are less important, in which case it is sufficient to note the
existence of the bug. I think this is an example of the latter.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The correct kernel version seems to 5.11, not 5.10:
$ git describe --contains d0e3fc69d00d
v5.11-rc1~76^2~251
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christophe Leroy via Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213421
[
In ppc32 functions section, the Y2038 compliant function
__kernel_clock_gettime64() is missing.
It was added by commit d0e3fc69d00d
("powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32")
]
.../linux$ git describe d0e3fc69d00d
v5.10-rc2-76-gd0e3fc69d00d
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As pointed out by Nora, the example shown in the manual
page already demonstrates that the pathname is not absolute!
Reported-by: Nora Platiel <nplatiel@gmx.us>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove duplicated word.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a typo in the documentation of using fallocate to allocate shared
blocks. The flag FALLOC_FL_UNSHARE should instead be documented as
FALLOC_FL_UNSHARE_RANGE.
Fixes: 63a599c657 ("man2/fallocate.2: Document behavior with shared blocks")
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Alan:
[
The on-line copy of the manual page "posixoptions(7)" dated
2018-04-30 has an entry for "getcwd()" in the section headed
"XSI - _XOPEN_LEGACY - _SC_XOPEN_LEGACY".
I believe that entry should be "getwd()" as that is the API call
which was present in X/Open-6 but withdrawn in X/Open-7.
]
mtk: confirmed by reviewing the table ("Removed Functions and
Symbols in Issue 7") at the end of Section B.1.1 on page
3564 of IEEE Std 1003.1, 2016 Edition.
Reported-by: Alan Peakall <Alan.Peakall@helpsystems.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=213419
ppc/32 and ppc/64 sections both have the following note:
The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are
not supported by the __kernel_clock_getres and
__kernel_clock_gettime interfaces; the kernel falls back to the
real system call
This note has been wrong from quite some time now, since commit
654abc69ef2e ("powerpc/vdso32: Add support for
CLOCK_{REALTIME/MONOTONIC}_COARSE") and commit
5c929885f1bb ("powerpc/vdso64: Add support for
CLOCK_{REALTIME/MONOTONIC}_COARSE")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since commit ee81d7e418, the flags list has been (only) above, not
below, these references.
(The flags table was added even before that, in commit 0b497138b9
("namespaces.7: Add table of namespaces to top of page"))
Fixes: ee81d7e418 ("namespaces.7: Include manual page references in the summary table of namespace types")
Signed-off-by: Štěpán Němec <stepnem@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Correct function signature by adding missing parenthesis.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I regularly get mildly lost in this table (and, indeed, didn't realise
it had two columns the first few times I used it to look at something
from the left column) ‒ separating the two columns improves clarity,
and makes which soup of numbers belongs to which character
much more obvious
Other encodings don't need this as they don't use double-columnated
tables
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The recv.2 misspelled `SO_EE_OFFENDER` to `SOCK_EE_OFFENDER`.
This patch fix this typo.
Signed-off-by: kXuan <kxuanobj@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The file being referred to no longer exists, as it was moved to
*.rst first (commit 20a78ae9ed297f2) and then to under
admin-guide (commit bf6b7a742e3f82b). Both those commits
are from 2019 (Linux 5.3).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using mount flag `MS_NOSUID` also affects SELinux domain transitions but
this has not been documented well.
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With a simple backslash, '\0' ended up as ' ' in the man output.
Reported-by: Štěpán Němec <stepnem@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
No implementation or spec requires *n to be 0 to allocate a new buffer:
* musl checks for !*lineptr
(and sets *n=0 for later allocations)
* glibc checks for !*lineptr || !*n
(but only because it allocates early)
* NetBSD checks for !*lineptr
(and sets *n=0 for later allocations)
(but specifies *n => mlen(*lineptr) >= *n as a precondition,
to which this appears to be an exception)
* FreeBSD checks for !*lineptr and sets *n=0
(and specifies !*lineptr as sufficient)
* Lastly, POSIX.1-2017 specifies:
> If *n is non-zero, the application shall ensure that *lineptr
> either points to an object of size at least *n bytes,
> or is a null pointer.
The new wording matches POSIX, even if it arrives at the point slightly
differently
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Don't document includes that provide types; only those that
provide prototypes and constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The types that need <sys/types.h> are better documented in
system_data_types(7). Let's keep only the includes for the
prototypes and the constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'struct iovec' is defined in <bits/types/struct_iovec.h>,
which is included by <sys/io.h>, but it is also included by
<bits/fcntl-linux.h>, which is in the end included by <fcntl.h>.
Given that we already include <fcntl.h>, we don't need any more
includes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'struct utimbuf' is provided by <utime.h>.
There's no need for <sys/types.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/types.h> makes no sense for a function that only uses 'int'.
The flags used by this function are provided by <fcntl.h>
(or others), but not by <linux/userfaultfd.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'mode_t', which is the only reason this might have been ever
needed, is provided by <sys/stat.h> since POSIX.1-2001.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'off_t', which is the only reason this might have been ever
needed, is provided by <unistd.h> since POSIX.1-2001.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There seems to be no reason to include <unistd.h>.
<sys/swap.h> already provides both the function prototypes and the
SWAP_* constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<unistd.h> doesn't seem to be needed:
AT_* constants come from <fcntl.h>
STATX_* constants come from <sys/stat.h>
'struct statx' comes from <sys/stat.h>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove <sys/types.h>; ffix too
<sys/types.h> is only needed for 'struct stat'.
That is better documented in system_data_types(7).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A function declarator with empty parentheses, which is not a
prototype, is an obsolescent feature of C (See C17 6.11.6.1), and
doesn't mean 0 parameters, but instead that no information about
the parameters is provided (See C17 6.5.2.2).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's only needed for getting 'mode_t'.
But that type is better documented in system_data_types(7).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Doing so decreases the degree to which text is indented, and
thus avoids short, poorly wrapped lines.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Jann Horn:
[[
As discussed at
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
we need to re-check checkNotificationIdIsValid() after reading remote
memory but before using the read value in any way. Otherwise, the
syscall could in the meantime get interrupted by a signal handler, the
signal handler could return, and then the function that performed the
syscall could free() allocations or return (thereby freeing buffers on
the stack).
In essence, this pread() is (unavoidably) a potential use-after-free
read; and to make that not have any security impact, we need to check
whether UAF read occurred before using the read value. This should
probably be called out elsewhere in the manpage, too...
Now, of course, **reading** is the easy case. The difficult case is if
we have to **write** to the remote process... because then we can't
play games like that. If we write data to a freed pointer, we're
screwed, that's it. (And for somewhat unrelated bonus fun, consider
that /proc/$pid/mem is originally intended for process debugging,
including installing breakpoints, and will therefore happily write
over "readonly" private mappings, such as typical mappings of
executable code.)
So, uuuuh... I guess if anyone wants to actually write memory back to
the target process, we'd better come up with some dedicated API for
that, using an ioctl on the seccomp fd that magically freezes the
target process inside the syscall while writing to its memory, or
something like that? And until then, the manpage should have a big fat
warning that writing to the target's memory is simply not possible
(safely).
]]
and
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>:
[[
The second bit of trouble is that if the supervisor is so oblivious
that it doesn't realize that syscalls can be interrupted, it'll run
into other problems. Let's say the target process does something like
this:
int func(void) {
char pathbuf[4096];
sprintf(pathbuf, "/tmp/blah.%d", some_number);
mount("foo", pathbuf, ...);
}
and mount() is handled with a notification. If the supervisor just
reads the path string and immediately passes it into the real mount()
syscall, something like this can happen:
target: starts mount()
target: receives signal, aborts mount()
target: runs signal handler, returns from signal handler
target: returns out of func()
supervisor: receives notification
supervisor: reads path from remote buffer
supervisor: calls mount()
but because the stack allocation has already been freed by the time
the supervisor reads it, the supervisor just reads random garbage, and
beautiful fireworks ensue.
So the supervisor *fundamentally* has to be written to expect that at
*any* time, the target can abandon a syscall. And every read of remote
memory has to be separated from uses of that remote memory by a
notification ID recheck.
And at that point, I think it's reasonable to expect the supervisor to
also be able to handle that a syscall can be aborted before the
notification is delivered.
]]
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Rename the function that does the SECCOMP_IOCTL_NOTIF_ID_VALID
check.
- Make that function return a 'bool' rather than terminating the
process.
- Use that return value in the calling function.
- Rework/improve various related comments.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pidfd_open(2) and pidfd_getfd(2) presumably have use cases
with the user-space notification feature.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
According to Tycho Andersen, he had no particular use case
in mind when building this detail into the API.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The seccomp user-space notification feature can cause changes in
the semantics of SA_RESTART with respect to system calls that
would never normally be restarted. Point the reader to the page
that provide further details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
And, as noted by Jann Horn, note how the user-space notification
mechanism causes a small breakage in the user-space API with
respect to nonrestartable system calls.
====
From the email discussion with Jann Horn
> >> So, I partially demonstrated what you describe here, for two example
> >> system calls (epoll_wait() and pause()). But I could not exactly
> >> demonstrate things as I understand you to be describing them. (So,
> >> I'm not sure whether I have not understood you correctly, or
> >> if things are not exactly as you describe them.)
> >>
> >> Here's a scenario (A) that I tested:
> >>
> >> 1. Target installs seccomp filters for a blocking syscall
> >> (epoll_wait() or pause(), both of which should never restart,
> >> regardless of SA_RESTART)
> >> 2. Target installs SIGINT handler with SA_RESTART
> >> 3. Supervisor is sleeping (i.e., is not blocked in
> >> SECCOMP_IOCTL_NOTIF_RECV operation).
> >> 4. Target makes a blocking system call (epoll_wait() or pause()).
> >> 5. SIGINT gets delivered to target; handler gets called;
> >> ***and syscall gets restarted by the kernel***
> >>
> >> That last should never happen, of course, and is a result of the
> >> combination of both the user-notify filter and the SA_RESTART flag.
> >> If one or other is not present, then the system call is not
> >> restarted.
> >>
> >> So, as you note below, the UAPI gets broken a little.
> >>
> >> However, from your description above I had understood that
> >> something like the following scenario (B) could occur:
> >>
> >> 1. Target installs seccomp filters for a blocking syscall
> >> (epoll_wait() or pause(), both of which should never restart,
> >> regardless of SA_RESTART)
> >> 2. Target installs SIGINT handler with SA_RESTART
> >> 3. Supervisor performs SECCOMP_IOCTL_NOTIF_RECV operation (which
> >> blocks).
> >> 4. Target makes a blocking system call (epoll_wait() or pause()).
> >> 5. Supervisor gets seccomp user-space notification (i.e.,
> >> SECCOMP_IOCTL_NOTIF_RECV ioctl() returns
> >> 6. SIGINT gets delivered to target; handler gets called;
> >> and syscall gets restarted by the kernel
> >> 7. Supervisor performs another SECCOMP_IOCTL_NOTIF_RECV operation
> >> which gets another notification for the restarted system call.
> >>
> >> However, I don't observe such behavior. In step 6, the syscall
> >> does not get restarted by the kernel, but instead returns -1/EINTR.
> >> Perhaps I have misconstructed my experiment in the second case, or
> >> perhaps I've misunderstood what you meant, or is it possibly the
> >> case that things are not quite as you said?
>
> Thanks for the code, Jann (including the demo of the CLONE_FILES
> technique to pass the notification FD to the supervisor).
>
> But I think your code just demonstrates what I described in
> scenario A. So, it seems that I both understood what you
> meant (because my code demonstrates the same thing) and
> also misunderstood what you said (because I thought you
> were meaning something more like scenario B).
Ahh, sorry, I should've read your mail more carefully. Indeed, that
testcase only shows scenario A. But the following shows scenario B...
[Below, two pieces of code from Jann, with a lot of
cosmetic changes by mtk.]
====
[And from a follow-up in the same email thread:]
> If userspace relies on non-restarting behavior, it should be using
> something like epoll_pwait(). And that stuff only unblocks signals
> after we've already past the seccomp checks on entry.
Thanks for elaborating that detail, since as soon as you talked
about "enlarging a preexisting race" above, I immediately wondered
sigsuspend(), pselect(), etc.
(Mind you, I still wonder about the effect on system calls that
are normally nonrestartable because they have timeouts. My
understanding is that the kernel doesn't restart those system
calls because it's impossible for the kernel to restart the call
with the right timeout value. I wonder what happens when those
system calls are restarted in the scenario we're discussing.)
Anyway, returning to your point... So, to be clear (and to
quickly remind myself in case I one day reread this thread),
there is not a problem with sigsuspend(), pselect(), ppoll(),
and epoll_pwait() since:
* Before the syscall, signals are blocked in the target.
* Inside the syscall, signals are still blocked at the time
the check is made for seccomp filters.
* If a seccomp user-space notification event kicks, the target
is put to sleep with the signals still blocked.
* The signal will only get delivered after the supervisor either
triggers a spoofed success/failure return in the target or the
supervisor sends a CONTINUE response to the kernel telling it
to execute the target's system call. Either way, there won't be
any restarting of the target's system call (and the supervisor
thus won't see multiple notifications).
====
Scenario A
$ ./seccomp_unotify_restart_scen_A
C: installed seccomp: fd 3
C: woke 1 waiters
P: child installed seccomp fd 3
C: About to call pause(): Success
P: going to send SIGUSR1...
C: sigusr1_handler handler invoked
P: about to terminate
C: got pdeath signal on parent termination
C: about to terminate
/* Modified version of code from Jann Horn */
#define _GNU_SOURCE
#include <stdio.h>
#include <signal.h>
#include <err.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>
#include <stddef.h>
#include <limits.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/futex.h>
struct {
int seccomp_fd;
} *shared;
static void
sigusr1_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: sigusr1_handler handler invoked\n");
}
static void
sigusr2_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: got pdeath signal on parent termination\n");
printf("C: about to terminate\n");
exit(0);
}
int
main(void)
{
setbuf(stdout, NULL);
/* Allocate memory that will be shared by parent and child */
shared = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (shared == MAP_FAILED)
err(1, "mmap");
shared->seccomp_fd = -1;
/* glibc's clone() wrapper doesn't support fork()-style usage */
/* Child process and parent share file descriptor table */
pid_t child = syscall(__NR_clone, CLONE_FILES | SIGCHLD,
NULL, NULL, NULL, 0);
if (child == -1)
err(1, "clone");
/* CHILD */
if (child == 0) {
/* don't outlive the parent */
prctl(PR_SET_PDEATHSIG, SIGUSR2);
if (getppid() == 1)
exit(0);
/* Install seccomp filter */
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
struct sock_filter insns[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_pause, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_USER_NOTIF),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
};
struct sock_fprog prog = {
.len = sizeof(insns) / sizeof(insns[0]),
.filter = insns
};
int seccomp_ret = syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER,
SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog);
if (seccomp_ret < 0)
err(1, "install");
printf("C: installed seccomp: fd %d\n", seccomp_ret);
/* Place the notifier FD number into the shared memory */
__atomic_store(&shared->seccomp_fd, &seccomp_ret,
__ATOMIC_RELEASE);
/* Wake the parent */
int futex_ret =
syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAKE,
INT_MAX, NULL, NULL, 0);
printf("C: woke %d waiters\n", futex_ret);
/* Establish SA_RESTART handler for SIGUSR1 */
struct sigaction act = {
.sa_sigaction = sigusr1_handler,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(SIGUSR1, &act, NULL))
err(1, "sigaction");
struct sigaction act2 = {
.sa_sigaction = sigusr2_handler,
.sa_flags = 0
};
if (sigaction(SIGUSR2, &act2, NULL))
err(1, "sigaction");
/* Make a blocking system call */
perror("C: About to call pause()");
pause();
perror("C: pause returned");
exit(0);
}
/* PARENT */
/* Wait for futex wake-up from child */
int futex_ret = syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAIT,
-1, NULL, NULL, 0);
if (futex_ret == -1 && errno != EAGAIN)
err(1, "futex wait");
/* Get notification FD from the child */
int fd = __atomic_load_n(&shared->seccomp_fd, __ATOMIC_ACQUIRE);
printf("\tP: child installed seccomp fd %d\n", fd);
sleep(1);
printf("\tP: going to send SIGUSR1...\n");
kill(child, SIGUSR1);
sleep(1);
printf("\tP: about to terminate\n");
exit(0);
}
====
Scenario B
$ ./seccomp_unotify_restart_scen_B
C: installed seccomp: fd 3
C: woke 1 waiters
C: About to call pause()
P: child installed seccomp fd 3
P: about to SECCOMP_IOCTL_NOTIF_RECV
P: got notif: id=17773741941218455591 pid=25052 nr=34
P: about to send SIGUSR1 to child...
P: about to SECCOMP_IOCTL_NOTIF_RECV
C: sigusr1_handler handler invoked
P: got notif: id=17773741941218455592 pid=25052 nr=34
P: about to send SIGUSR1 to child...
P: about to SECCOMP_IOCTL_NOTIF_RECV
C: sigusr1_handler handler invoked
P: got notif: id=17773741941218455593 pid=25052 nr=34
P: about to send SIGUSR1 to child...
P: about to SECCOMP_IOCTL_NOTIF_RECV
C: sigusr1_handler handler invoked
P: got notif: id=17773741941218455594 pid=25052 nr=34
P: about to send SIGUSR1 to child...
C: sigusr1_handler handler invoked
C: got pdeath signal on parent termination
C: about to terminate
/* Modified version of code from Jann Horn */
#define _GNU_SOURCE
#include <stdio.h>
#include <signal.h>
#include <err.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>
#include <stddef.h>
#include <string.h>
#include <limits.h>
#include <inttypes.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/futex.h>
struct {
int seccomp_fd;
} *shared;
static void
sigusr1_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: sigusr1_handler handler invoked\n");
}
static void
sigusr2_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: got pdeath signal on parent termination\n");
printf("C: about to terminate\n");
exit(0);
}
static size_t
max_size(size_t a, size_t b)
{
return (a > b) ? a : b;
}
int
main(void)
{
setbuf(stdout, NULL);
/* Allocate memory that will be shared by parent and child */
shared = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (shared == MAP_FAILED)
err(1, "mmap");
shared->seccomp_fd = -1;
/* glibc's clone() wrapper doesn't support fork()-style usage */
/* Child process and parent share file descriptor table */
pid_t child = syscall(__NR_clone, CLONE_FILES | SIGCHLD,
NULL, NULL, NULL, 0);
if (child == -1)
err(1, "clone");
/* CHILD */
if (child == 0) {
/* don't outlive the parent */
prctl(PR_SET_PDEATHSIG, SIGUSR2);
if (getppid() == 1)
exit(0);
/* Install seccomp filter */
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
struct sock_filter insns[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_pause, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_USER_NOTIF),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
};
struct sock_fprog prog = {
.len = sizeof(insns) / sizeof(insns[0]),
.filter = insns
};
int seccomp_ret = syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER,
SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog);
if (seccomp_ret < 0)
err(1, "install");
printf("C: installed seccomp: fd %d\n", seccomp_ret);
/* Place the notifier FD number into the shared memory */
__atomic_store(&shared->seccomp_fd, &seccomp_ret,
__ATOMIC_RELEASE);
/* Wake the parent */
int futex_ret =
syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAKE,
INT_MAX, NULL, NULL, 0);
printf("C: woke %d waiters\n", futex_ret);
/* Establish SA_RESTART handler for SIGUSR1 */
struct sigaction act = {
.sa_sigaction = sigusr1_handler,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(SIGUSR1, &act, NULL))
err(1, "sigaction");
struct sigaction act2 = {
.sa_sigaction = sigusr2_handler,
.sa_flags = 0
};
if (sigaction(SIGUSR2, &act2, NULL))
err(1, "sigaction");
/* Make a blocking system call */
printf("C: About to call pause()\n");
pause();
perror("C: pause returned");
exit(0);
}
/* PARENT */
/* Wait for futex wake-up from child */
int futex_ret = syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAIT,
-1, NULL, NULL, 0);
if (futex_ret == -1 && errno != EAGAIN)
err(1, "futex wait");
/* Get notification FD from the child */
int fd = __atomic_load_n(&shared->seccomp_fd, __ATOMIC_ACQUIRE);
printf("\tP: child installed seccomp fd %d\n", fd);
/* Discover seccomp buffer sizes and allocate notification buffer */
struct seccomp_notif_sizes sizes;
if (syscall(__NR_seccomp, SECCOMP_GET_NOTIF_SIZES, 0, &sizes))
err(1, "notif_sizes");
struct seccomp_notif *notif =
malloc(max_size(sizeof(struct seccomp_notif),
sizes.seccomp_notif));
if (!notif)
err(1, "malloc");
for (int i = 0; i < 4; i++) {
printf("\tP: about to SECCOMP_IOCTL_NOTIF_RECV\n");
memset(notif, '\0', sizes.seccomp_notif);
if (ioctl(fd, SECCOMP_IOCTL_NOTIF_RECV, notif))
err(1, "notif_recv");
printf("\tP: got notif: id=%llu pid=%u nr=%d\n",
notif->id, notif->pid, notif->data.nr);
sleep(1);
printf("\tP: about to send SIGUSR1 to child...\n");
kill(child, SIGUSR1);
}
sleep(1);
exit(0);
}
====
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the usual case, read(fd, buf, PATH_MAX) will return PATH_MAX
bytes that include trailing garbage after the pathname. So the
right check is to scan from the start of the buffer to see if
there's a NUL, and error if there is not.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After some discussions with Jann Horn, perhaps a better way of
dealing with an invalid target pathname is to trigger an
error for the system call.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Jann Horn:
[[
>>>> struct seccomp_notif_resp *resp = malloc(sizes.seccomp_notif_resp);
>>>
>>> This should probably do something like max(sizes.seccomp_notif_resp,
>>> sizeof(struct seccomp_notif_resp)) in case the program was built
>>> against new UAPI headers that make struct seccomp_notif_resp big, but
>>> is running under an old kernel where that struct is still smaller?
>>
>> I'm confused. Why? I mean, if the running kernel says that it expects
>> a buffer of a certain size, and we allocate a buffer of that size,
>> what's the problem?
>
> Because in userspace, we cast the result of malloc() to a "struct
> seccomp_notif_resp *". If the kernel tells us that it expects a size
> smaller than sizeof(struct seccomp_notif_resp), then we end up with a
> pointer to a struct that consists partly of allocated memory, partly
> of out-of-bounds memory, which is generally a bad idea - I'm not sure
> whether the C standard permits that. And if userspace then e.g.
> decides to access some member of that struct that is beyond what the
> kernel thinks is the struct size, we get actual OOB memory accesses.
Got it. (But gosh, this seems like a fragile API mess.)
I added the following to the code:
/* When allocating the response buffer, we must allow for the fact
that the user-space binary may have been built with user-space
headers where 'struct seccomp_notif_resp' is bigger than the
response buffer expected by the (older) kernel. Therefore, we
allocate a buffer that is the maximum of the two sizes. This
ensures that if the supervisor places bytes into the response
structure that are past the response size that the kernel expects,
then the supervisor is not touching an invalid memory location. */
size_t resp_size = sizes.seccomp_notif_resp;
if (sizeof(struct seccomp_notif_resp) > resp_size)
resp_size = sizeof(struct seccomp_notif_resp);
struct seccomp_notif_resp *resp = malloc(resp_size);
]]
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Jann Horn:
>> We should probably make sure here that the value we read is actually
>> NUL-terminated?
>
> So, I was curious about that point also. But, (why) are we not
> guaranteed that it will be NUL-terminated?
Because it's random memory filled by another process, which we don't
necessarily trust. While seccomp notifiers aren't usable for applying
*extra* security restrictions, the supervisor will still often be more
privileged than the supervised process.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change "read(2) will return 0" to "read(2) may return 0".
Quoting Jann Horn:
Maybe make that "may return 0" instead of "will return 0" -
reading from /proc/$pid/mem can only return 0 in the
following cases AFAICS:
1. task->mm was already gone at open() time
2. mm->mm_users has dropped to zero (the mm only has lazytlb
users; page tables and VMAs are being blown away or have
been blown away)
3. the syscall was called with length 0
When a process has gone away, normally mm->mm_users will
drop to zero, but someone else could theoretically still be
holding a reference to the mm (e.g. someone else in the
middle of accessing /proc/$pid/mem). (Such references
should normally not be very long-lived though.)
Additionally, in the unlikely case that the OOM killer just
chomped through the page tables of the target process, I
think the read will return -EIO (same error as if the
address was simply unmapped) if the address is within a
non-shared mapping. (Maybe that's something procfs could do
better...)
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some strongly worded text warning the reader about the correct
uses of seccomp user-space notification.
Reported-by: Jann Horn <jannh@google.com>
Cowritten-by: Christian Brauner <christian@brauner.io>
Cowritten-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The APIs used by this mechanism comprise not only seccomp(2), but
also a number of ioctl(2) operations. And any useful example
demonstrating these APIs is will necessarily be rather long.
Trying to cram all of this into the seccomp(2) page would make
that page unmanageably long. Therefore, let's document this
mechanism in a separate page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text says the structures (plural!) contain a 'struct
seccomp_data'. But this is only true for the received notification
structure (seccomp_notif). So, reword the sentence to be more
general, noting simply that the structures may evolve over time.
Add some comments to the structure definition.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rework the description a little, and note that the close-on-exec
flag is set for the returned file descriptor.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I can't see a reason to include it. <fcntl.h> provides O_*
constants for 'flags', S_* constants for 'mode', and mode_t.
Probably a long time ago, some of those weren't defined in
<fcntl.h>, and both headers needed to be included, or maybe it's
a historical error.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
My previous patch intended to drop the docs for the lockdown lift
SysRq, but it missed this other section that refers to lifting it
via a keyboard - an allusion to that same SysRq.
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The patch that implemented lockdown lifting via SysRq ended up
getting dropped[*] before the feature was merged upstream. Having
the feature documented but unsupported has caused some confusion
for our users.
[*] http://archive.lwn.net:8080/linux-kernel/CACdnJuuxAM06TcnczOA6NwxhnmQUeqqm3Ma8btukZpuCS+dOqg@mail.gmail.com/
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Pedro Principeza <pedro.principeza@canonical.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kyle McMartin <kyle@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Instead of having a monolithic 'make install', break it into
multiple targets such as 'make install-man3'. This simplifies
packaging, for example in Debian, where they break this project
into several packages: 'manpages' and 'manpages-dev', each
containing different mandirs.
The above allows for multithread installation: 'make -j'
Also, don't overwrite files that don't need to be overwritten, by
having a target for files, which makes use of make's timestamp
comparison.
This allows for much faster installation times.
For comparison, on my laptop (i7-8850H; 6C/12T):
Old Makefile:
~/src/linux/man-pages$ time sudo make >/dev/null
real 0m7.509s
user 0m5.269s
sys 0m2.614s
The times with the old makefile, varied a lot, between
5 and 10 seconds. The times after applying this patch
are much more consistent. BTW, I compared these times to
the very old Makefile of man-pages-5-09, and those were
around 3.5 s, so it was a bit of my fault to have such a
slow Makefile, when I changed the Makefile some weeks ago.
New Makefile (full clean install):
~/src/linux/man-pages$ time sudo make >/dev/null
real 0m5.160s
user 0m4.326s
sys 0m1.137s
~/src/linux/man-pages$ time sudo make -j2 >/dev/null
real 0m1.602s
user 0m2.529s
sys 0m0.289s
~/src/linux/man-pages$ time sudo make -j >/dev/null
real 0m1.398s
user 0m2.502s
sys 0m0.281s
Here we can see that 'make -j' drops times drastically,
compared to the old monolithic Makefile. Not only that,
but since when we are working with the man pages there
aren't many pages involved, times will be even better.
Here are some times with a single page changed (touched):
New Makefile (one page touched):
~/src/linux/man-pages$ touch man2/membarrier.2
~/src/linux/man-pages$ time sudo make install
- INSTALL /usr/local/share/man/man2/membarrier.2
real 0m0.988s
user 0m0.966s
sys 0m0.025s
~/src/linux/man-pages$ touch man2/membarrier.2
~/src/linux/man-pages$ time sudo make install -j
- INSTALL /usr/local/share/man/man2/membarrier.2
real 0m0.989s
user 0m0.943s
sys 0m0.049s
Also, modify the output of the make install and uninstall commands
so that a line is output for each file or directory that is
installed, similarly to the kernel's Makefile. This doesn't apply
to html targets, which haven't been changed in this commit.
Also, make sure that for each invocation of $(INSTALL_DIR), no
parents are created, (i.e., avoid `mkdir -p` behavior). The GNU
make manual states that it can create race conditions. Instead,
declare as a prerequisite for each directory its parent directory,
and let make resolve the order of creation.
Also, use ':=' instead of '=' to improve performance, by
evaluating each assignment only once.
Ensure than the shell is not called when not needed, by removing
all ";" and quotes in the commands.
See also: <https://stackoverflow.com/q/67862417/6872717>
Specify conventions and rationales used in the Makefile in a comment.
Add copyright.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document also why each header is required
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This function doesn't use any flags or special types, so there's
no reason to include <asm/unistd.h>; remove it. Add the includes
needed for syscall(2) only.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Also document why each header is needed.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This error can occur if the caller is does not have CAP_IPC_LOCK
and is not a member of the sysctl_hugetlb_shm_group.
Reported-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As a deprecated feature, it appears that the RLIMIT_MEMLOCK
can also be used to permit huge page allocation, but let's
not document that for now.
In the Linux 5.12, see fs/hugetlbfs/inode.c.
static int can_do_hugetlb_shm(void)
{
kgid_t shm_group;
shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group);
return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
}
...
struct file *hugetlb_file_setup(const char *name, size_t size,
vm_flags_t acctflag, struct user_struct **user,
int creat_flags, int page_size_log)
{
...
if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
*user = current_user();
if (user_shm_lock(size, *user)) {
task_lock(current);
pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n",
current->comm, current->pid);
task_unlock(current);
} else {
*user = NULL;
return ERR_PTR(-EPERM);
}
}
...
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The installation path was changed recently (See 'prefix' in the
Makefile). I forgot to update the README with those changes.
Fix it.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain also why headers are needed.
And some ffix.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is only used for providing 'sigset_t'. We're only documenting
(with some exceptions) the includes needed for constants and the
prototype itself. And 'sigset_t' is better documented in
system_data_types(7). Remove that include.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
All of the constants used by mknod() are defined in <sys/stat.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AFAICS, there's no use for <unistd.h> here. The prototype is
declared in <sys/mman.h>, and there are no constants needed.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the libkeyutils prototype from the synopsis, which isn't
documented in the rest of the page, and as NOTES says, it's
probably better to use the various library functions.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The constants needed for using this function are defined in
<linux/ipc.h>. Add the include, even when those constants are not
mentioned in this manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Of course that is for the glibc wrapper. As all of the other
pages that don't explicitly say otherwise.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this case there's a wrapper provided by libaio,
but this page documents the raw syscall.
Also remove <linux/time.h> from the includes: 'struct timespec'
is already documented in system_data_types(7), where the
information is more up to date.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this case there's a wrapper provided by libaio,
but this page documents the raw syscall.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the same time, document only headers that are required
for calling the function, or those that are specific to the
function:
<unistd.h> is required for the syscall() prototype.
<sys/syscall.h> is required for the syscall name SYS_xxx.
<linux/futex.h> is specific to this syscall.
However, uint32_t is generic enough that it shouldn't be
documented here. The system_data_types(7) page already documents
it, and is more precise about it. The same goes for timespec.
As a general rule a man[23] page should document the header that
includes the prototype, and all of the headers that define macros
that should be used with the call. However, the information about
types should be restricted to system_data_types(7) (and that page
should probably be improved by adding types), except for types
that are very specific to the call. Otherwise, we're duplicating
info and it's then harder to maintain, and probably outdated in
the future.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This complements commit e3eba861bd.
Since we don't need syscall(2) anymore, we don't need SYS_* definitions.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
On 5/10/21 7:13 PM, Alejandro Colomar (man-pages) wrote:
> Hi Michael,
>
> On 5/10/21 1:39 AM, Michael Kerrisk (man-pages) wrote:
>>> - Specify shebang
>>
>> Why? It's not quite obvious to me, and the commit message
>> should really explain...
>
> Hmmm. I have some minor reasons to add it, but not a really good one.
>
> * Some editors don't recognize 'Makefile' as a special name, so the
> shebang helps detecting which language the file is using (e.g., for
> coloring).
>
> * I tend to subdivide a big Makefile into a small Makefile and many
> submakefiles stored in <./libexec/>. Those obviously need different
> names, and given that the makefile extension is not very standard (I use
> .mk), having a shebang helps knowing what the file is. After that, I
> also have it on the main Makefile for consistency. But here we only
> have one Makefile, so it doesn apply very much.
I think I'll remove it. It is kind of idiosyncratic, leaves the
reader asking "why?".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Ignore everything new by default.
This avoids having to update the .gitignore when we need to ignore
something new. It also avoids accidents that may add an unwanted
temporary file.
Cc: Debian man-pages <manpages@packages.debian.org>
Cc: Dr. Tobias Quathamer <toddy@debian.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fpurge(i_stream) does the same as fflush(i_stream), AFAIK.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AT_EMPTY_PATH works with empty strings (""), but not with NULL
(or at least it's not obvious).
The relevant kernel code is the following:
linux$ sed -n 189,198p fs/namei.c
result->refcnt = 1;
/* The empty path is special. */
if (unlikely(!len)) {
if (empty)
*empty = 1;
if (!(flags & LOOKUP_EMPTY)) {
putname(result);
return ERR_PTR(-ENOENT);
}
}
Reported-by: Walter Harms <wharms@bfs.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See <bits/byteswap.h> in glibc.
These macros call functions of the form __bswap_N(),
which use uintN_t.
Even though it's true that they are macros,
it's transparent to the user.
The user will see their results casted to unsigned types
after the conversion due to the underlying functions,
so it's better to document these as the underlying functions,
specifying the types.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in
pthread_attr_getinheritsched().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_attr_getinheritsched
sysdeps/htl/pthread.h:90:
extern int pthread_attr_getinheritsched (const pthread_attr_t *__restrict __attr,
int *__restrict __inheritsched)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:313:
extern int pthread_attr_getinheritsched (const pthread_attr_t *__restrict
__attr, int *__restrict __inherit)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'C library/kernel differences' was added to BUGS incorrectly.
Fix it
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Those pages didn't exist. Fix the section number.
I noticed the typo thanks to the HTML pages on man7.org.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Prerequisites can run in parallel. This wouldn't make any sense
when uninstalling and installing again.
For that, use consecutive commands, which run one after the other
even with multiple cores.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
IMPORTANT for distributions:
This changes prefix to be '/usr/local' as is expected by default,
instead of the old '/usr' value.
- Use standard variables:
- prefix should be '/usr/local'
- mandir (instead of MANDIR)
- htmldir (instead of HTDIR)
- ...
see <https://www.gnu.org/software/make/manual/html_node/Directory-Variables.html>
- Use standard targets:
- html (build html files; don't install them)
- install-html (instead of html)
- installdirs (instead of 'mkdir -p'/'install -d' inside other targets)
- ...
see <https://www.gnu.org/software/make/manual/html_node/Standard-Targets.html#Standard-Targets>
- Use .PHONY
- ?= is not needed. User input overrides any assignment. Use =
- Use standard command variables, instead of directly calling commands.
- $(INSTALL_DATA) (instead of install -m 644)
- $(INSTALL_DIR) (instead of install -d -m 755 or mkdir -p)
see <https://www.gnu.org/software/make/manual/html_node/Command-Variables.html#Command-Variables>
- Specify SHELL = /bin/bash
- Specify shebang
- Allow variable html extension (or no extension at all)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
mkdir -p doesn't fail if the directory already exists.
Remove redundant checks.
Use .html as default HTDIR.
Remove checks for undefined HTDIR.
Show what the target does, as with other targets (remove '@').
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I clarified the code about two things:
- Checking how many arguments are being passed.
Here, some functions didn't reject extra arguments when they
weren't being used. Fix that.
I also changed the code to use $#, which is more explicit.
And use arithmetic expressions, which better indicate that
we're dealing with numbers.
- Remove unneeded options from sort.
Reported-by: Stefan Puiu <stefan.puiu@gmail.com>
After Stefan asked about why am I using 'sort -V',
I noticed that I really don't need '-V', and it may confuse
people trying to understand the script, so even though I
slightly prefer the output of 'sort -V', in this case, it's
better to use the simpler 'sort' (yet I need 'sort', to
maintain consistency in the results (find is quite random)).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix the error messages to clearly show that both dirs and manual
pages are accepted, and that more than one argument is accepted.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch makes man_lsfunc() search for the function prototypes,
instead of relying on the current manual page formatting,
which might change in the future, and break this function.
It also simplifies the code, by reusing man_section().
Create a new function sed_rm_ccomments(), which is needed by
man_lsfunc(), and may also be useful in other cases.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The output of 'git status' is not stable.
The more stable 'git status --porcelain' is more complex,
and scripting around it would be more complex.
However, 'git diff --staged --name-only' produces
the output that we were lookiong for.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Neither POSIX or glibc use 'const' in
pthread_mutexattr_setrobust().
Remove it.
.../glibc$ grep_glibc_prototype pthread_mutexattr_setrobust
sysdeps/htl/pthread.h:355:
extern int pthread_mutexattr_setrobust (pthread_mutexattr_t *__attr,
int __robustness)
__THROW __nonnull ((1));
sysdeps/nptl/pthread.h:888:
extern int pthread_mutexattr_setrobust (pthread_mutexattr_t *__attr,
int __robustness)
__THROW __nonnull ((1));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The rest of the page writes the characters without naming them.
Follow that convention.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improved the `getopt(3)` man page in the following ways:
1) Defined the existing term "legitimate option character".
2) Added an additional NOTE stressing that arguments are parsed in strict
order and the implications of this when numeric options are utilised.
Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Despite my mention of this spawning a hilarious discussion
on IRC, this alignment restriction should be 128-bit, not
126-bit.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a missing "to" in an "in order to" formulation.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CIFS flock() locks behave differently than the standard. Give overview
of those differences.
Here is the rendered text:
CIFS details
In Linux kernels up to 5.4, flock() is not propagated over SMB. A file
with such locks will not appear locked for remote clients.
Since Linux 5.5, flock() locks are emulated with SMB byte-range locks
on the entire file. Similarly to NFS, this means that fcntl(2) and
flock() locks interact with one another. Another important side-effect
is that the locks are not advisory anymore: any IO on a locked file
will always fail with EACCES when done from a separate file descriptor.
This difference originates from the design of locks in the SMB proto-
col, which provides mandatory locking semantics.
Remote and mandatory locking semantics may vary with SMB protocol,
mount options and server type. See mount.cifs(8) for additional infor-
mation.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Discussion: linux-man <https://lore.kernel.org/linux-man/20210302154831.17000-1-aaptel@suse.com/>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In b0b19983d9 we removed
<sys/types.h>. For the same reasons there, remove now <sys/ipc.h>
from many pages.
If someone wonders why <sys/ipc.h> was needed, the reason was to
get all the definitions of IPC_* constants. However, that header
is now included by <sys/msg.h>, so it's not needed anymore to
explicitly include it. Quoting POSIX: "In addition, the
<sys/msg.h> header shall include the <sys/ipc.h> header."
There were some remaining cases where I forgot to remove
<sys/types.h>; remove them now too.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AFAICS, all types and constants used by these functions are
defined in <fcntl.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A few architectures have a different call signature for pipe().
Since those architectures are the minority, place the prototype
at the end of the SYNOPSIS, rather than the start.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SEE ALSO: move pam_namespace(8) from namespaces(7) to
mount_namespaces(7) (since pam_namespace(8) makes use of
mount namespaces specifically).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Userfaultfd write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
[alx: ffix + srcfix]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
[alx: srcfix]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
My initial reading of this was that type modifiers were probably
not supported. But they are, and this is actually documented
further up, in the type modifiers documentation. But to make it
clearer, let's copy the language that printf(3) has in its %n
section.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this case there's a wrapper provided by libaio,
but this page documents the raw kernel syscall.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<linux/unistd.h> is not needed. We need <unistd.h> for syscall(),
and <sys/syscall.h> for SYS_exit_group.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add <linux/fcntl.h>, which contains AT_* definitions used by
execveat().
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The CLONE_* constants seem to be available from either
<linux/sched.h> or <sched.h>, and since clone3() already
includes <linux/sched.h> for 'struct clone_args', <sched.h>
is not really needed, AFAICS; however, to avoid confusion,
I also included <sched.h> for clone3() for consistency:
clone() is getting CLONE_* from <sched.h>, and it would confuse
the reader if clone3() got the same CLONE_* constants from a
different header.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AFAICS, there's no reason to include that.
All of the macros that this function uses
are already defined in the other headers.
Cc: glibc <libc-alpha@sourceware.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page didn't specify includes, and the syscalls are extinct, so
instead of adding incomplete information about includes, just
leave it without any includes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the include that provides the prototype doesn't need a comment.
Also sort the includes alphabetically.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the one that provides the prototype doesn't need a comment.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<linux/fs.h> doesn't seem to be needed!
Only the include that provides the prototype doesn't need a comment.
Also sort the includes alphabetically.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the include that provides the prototype doesn't need a comment.
Also sort the includes alphabetically.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the one that provides the prototype doesn't need a comment.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: Alex's change switches the comment to the more generally used
form "Definition of..."]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/time.h> is not needed to get the function declaration nor any
constant used by the function. It was only needed (before
POSIX.1) to get 'struct timeval', but that information would be
more suited for system_data_types(7), and not for this page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/time.h> is not required by any of the function declarations
or macro definitions used by these functions. It may be (or maybe
not) needed by some type inside the rlimit structure, but that
info belongs in system_data_types(7), not here.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/types.h> was only needed for size_t, AFAIK. That is already
(and more precisely) documented in system_data_types(7). Let's
remove it here, as it's not really needed for calling add_key().
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I couldn't find a reason for including <unistd.h>. All the macros
used by fcntl() are defined in <fcntl.h>. For comparison, FreeBSD
and OpenBSD don't specify <unistd.h> in their manual pages.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This function never returns to its caller.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ENODATA is an XSI STREAMS extension (not base POSIX).
Linux reused the name for extended attributes.
The current manual pages don't use ENODATA with its POSIX
meaning, so use the xattr(7) specific text, and leave the POSIX
meaning for a secondary paragraph.
Reported-by: Mark Kettenis <kettenis@openbsd.org>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Checked via the latest glibc source. execvpe calls getenv("PATH") and
searches that; the PATH in envp does not affect the search.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In Linux kernel 5.12, a new mode flag, MPOL_F_NUMA_BALANCING, is
added to set_mempolicy() to optimize the page placement among the
NUMA nodes with the NUMA balancing mechanism even if the memory of
the applications is bound with MPOL_BIND. This patch updates the
man page for the new mode flag.
Related kernel commits:
bda420b985054a3badafef23807c4b4fa38a3dff
[mtk: Minor fixes to commit message]
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: "Michael Kerrisk" <mtk.manpages@gmail.com>
[ alx: srcfix ]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The envp argument specifies the environment of the new process image,
not "the environment of the caller".
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The format string refers to the whole string passed in 'format'.
The syntax referred to is that of a conversion specification,
as called in the manual page.
Use specific language.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Can we add a small syntax structure for format string in printf(3)
manual. I personally find if easier to remember and scan. This has
been taken from OpenBSD printf(3) manual.
Signed-off-by: Utkarsh Singh <utkarsh190601@gmail.com>
[ alx: ffix ]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a sentence explaining what dup2() does in terms of file
descriptors and open file descriptions.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Sometimes people are confused, thinking a file descriptor is just a
number. To help avoid such confusions, add text highlighting that
a file descriptor is an index to an entry in the process's FD table.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As can be seen by any number of StackOverflow questions, people
persistently misunderstand what dup() does, and the existing manual
page text, which talks of "copying" a file descriptor doesn't help.
Rewrite the text a little to try to prevent some of these
misunderstandings, in particular noting at the start that dup()
allocates a new file descriptor.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
close_range() CLOSE_RANGE_USHARE triggers a call to dup_fd()
which in turn calls alloc_fdtable(), which checks that
sysctl_nr_open has not been exceeded.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current example program can't really be used to demonstrate the
effect of close_range(). Replace it by a program that does show the
effect of this system call.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This documents close_range(2) based on information in
278a5fbaed89dacd04e9d052f4594ffd0e0585de,
60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
582f1fb6b721facf04848d2ca57f34468da1813e.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The manual pages are already inconsistent in which headers need
to be included. Right now, not all of the types used by a
function have their required header included in the SYNOPSIS.
If we were to add the headers required by all of the types used by
functions, the SYNOPSIS would grow too much. Not only it would
grow too much, but the information there would be less precise.
Having system_data_types(7) document each type with all the
information about required includes is much more precise, and the
info is centralized so that it's much easier to maintain.
So let's document only the include required for the function
prototype, and also the ones required for the macros needed to
call the function.
<sys/types.h> only defines types, not functions or constants, so
it doesn't belong to man[23] (function) pages at all.
I ignore if some old systems had headers that required you to
include <sys/types.h> *before* them (incomplete headers), but if
so, those implementations would be broken, and those headers
should probably provide some kind of warning. I hope this is not
the case.
[mtk: Already in 2001, POSIX.1 removed the requirement to
include <sys/types.h> for many APIs, so this patch seems
well past due.]
Acked-by: Zack Weinberg <zackw@panix.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX uses 'restrict' in *wprintf() (see [v]fwprintf(3p)).
Let's use it here too.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In SYNOPSIS, shift arguments right a little to make the function
names stand out a little more.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wordexp().
Let's use it here too.
.../glibc$ grep_glibc_prototype wordexp
posix/wordexp.h:62:
extern int wordexp (const char *__restrict __words,
wordexp_t *__restrict __pwordexp, int __flags);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wmemcpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wmemcpy
wcsmbs/wchar.h:262:
extern wchar_t *wmemcpy (wchar_t *__restrict __s1,
const wchar_t *__restrict __s2, size_t __n) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcstok().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcstok
wcsmbs/wchar.h:217:
extern wchar_t *wcstok (wchar_t *__restrict __s,
const wchar_t *__restrict __delim,
wchar_t **__restrict __ptr) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcscpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcscpy
wcsmbs/wchar.h:87:
extern wchar_t *wcscpy (wchar_t *__restrict __dest,
const wchar_t *__restrict __src)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcscat().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcscat
wcsmbs/wchar.h:97:
extern wchar_t *wcscat (wchar_t *__restrict __dest,
const wchar_t *__restrict __src)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcrtomb().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcrtomb
wcsmbs/wchar.h:301:
extern size_t wcrtomb (char *__restrict __s, wchar_t __wc,
mbstate_t *__restrict __ps) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcpncpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcpncpy
wcsmbs/wchar.h:556:
extern wchar_t *wcpncpy (wchar_t *__restrict __dest,
const wchar_t *__restrict __src, size_t __n)
__THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcpcpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcpcpy
wcsmbs/wchar.h:551:
extern wchar_t *wcpcpy (wchar_t *__restrict __dest,
const wchar_t *__restrict __src) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in tdelete().
Let's use it here too.
.../glibc$ grep_glibc_prototype tdelete
misc/search.h:138:
extern void *tdelete (const void *__restrict __key,
void **__restrict __rootp,
__compar_fn_t __compar);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in sigwait().
Let's use it here too.
.../glibc$ grep_glibc_prototype sigwait
signal/signal.h:255:
extern int sigwait (const sigset_t *__restrict __set, int *__restrict __sig)
__nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in strptime().
However, glibc doesn't specify 'restrict' for the last parameter.
Let's use the most restrictive form here
(although I believe both to be equivalent).
.../glibc$ grep_glibc_prototype strptime
time/time.h:95:
extern char *strptime (const char *__restrict __s,
const char *__restrict __fmt, struct tm *__tp)
__THROW;
.../glibc$
Cc: <libc-alpha@sourceware.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in stpcpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype stpcpy
string/string.h:475:
extern char *stpcpy (char *__restrict __dest, const char *__restrict __src)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in statvfs().
Let's use it here too.
.../glibc$ grep_glibc_prototype statvfs
io/sys/statvfs.h:51:
extern int statvfs (const char *__restrict __file,
struct statvfs *__restrict __buf)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in sem_timedwait().
Let's use it here too.
.../glibc$ grep_glibc_prototype sem_timedwait
sysdeps/pthread/semaphore.h:62:
extern int sem_timedwait (sem_t *__restrict __sem,
const struct timespec *__restrict __abstime)
__nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in sem_getvalue().
Let's use it here too.
.../glibc$ grep_glibc_prototype sem_getvalue
sysdeps/pthread/semaphore.h:81:
extern int sem_getvalue (sem_t *__restrict __sem, int *__restrict __sval)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in realpath().
Let's use it here too.
.../glibc$ grep_glibc_prototype realpath
stdlib/stdlib.h:800:
extern char *realpath (const char *__restrict __name,
char *__restrict __resolved) __THROW __wur;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in readdir_r().
Let's use it here too.
.../glibc$ grep_glibc_prototype readdir_r
dirent/dirent.h:183:
extern int readdir_r (DIR *__restrict __dirp,
struct dirent *__restrict __entry,
struct dirent **__restrict __result)
__nonnull ((1, 2, 3)) __attribute_deprecated__;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fputs().
Let's use it here too.
.../glibc$ grep_glibc_prototype fputs
libio/stdio.h:631:
extern int fputs (const char *__restrict __s, FILE *__restrict __stream);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc uses 'restrict' in putgrent().
Let's use it here too.
.../glibc$ grep_glibc_prototype putgrent
grp/grp.h:93:
extern int putgrent (const struct group *__restrict __p,
FILE *__restrict __f);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in pthread_getschedparam().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_getschedparam
sysdeps/htl/pthread.h:882:
extern int pthread_getschedparam (pthread_t __thr, int *__restrict __policy,
struct sched_param *__restrict __param)
__THROW __nonnull ((2, 3));
sysdeps/nptl/pthread.h:426:
extern int pthread_getschedparam (pthread_t __target_thread,
int *__restrict __policy,
struct sched_param *__restrict __param)
__THROW __nonnull ((2, 3));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in
pthread_mutexattr_getpshared().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_mutexattr_getpshared
sysdeps/htl/pthread.h:368:
extern int pthread_mutexattr_getpshared(const pthread_mutexattr_t *__restrict __attr,
int *__restrict __pshared)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:830:
extern int pthread_mutexattr_getpshared (const pthread_mutexattr_t *
__restrict __attr,
int *__restrict __pshared)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in pthread_attr_getscope().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_attr_getscope
sysdeps/htl/pthread.h:125:
extern int pthread_attr_getscope (const pthread_attr_t *__restrict __attr,
int *__restrict __contentionscope)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:324:
extern int pthread_attr_getscope (const pthread_attr_t *__restrict __attr,
int *__restrict __scope)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in pthread_attr_getschedpolicy().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_attr_getschedpolicy
sysdeps/htl/pthread.h:113:
extern int pthread_attr_getschedpolicy (const pthread_attr_t *__restrict __attr,
int *__restrict __policy)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:304:
extern int pthread_attr_getschedpolicy (const pthread_attr_t *__restrict
__attr, int *__restrict __policy)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX uses 'restrict' in posix_spawnp().
However, glibc doesn't.
Let's document here the more restrictive of them, which is POSIX.
I reported a bug to glibc about this.
$ man 3p posix_spawnp |sed -n '/^SYNOPSIS/,/;/p'
SYNOPSIS
#include <spawn.h>
int posix_spawnp(pid_t *restrict pid, const char *restrict file,
const posix_spawn_file_actions_t *file_actions,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict], char *const envp[restrict]);
$
.../glibc$ grep_glibc_prototype posix_spawnp
posix/spawn.h:85:
extern int posix_spawnp (pid_t *__pid, const char *__file,
const posix_spawn_file_actions_t *__file_actions,
const posix_spawnattr_t *__attrp,
char *const __argv[], char *const __envp[])
__nonnull ((2, 5));
.../glibc$
I conciously did an exception with respect to the right margin
of the rendered page. Instead of having the right margin at 78
as usual (per Branden's recommendation), I let it use col 79
this time, to avoid breaking the prototype in an ugly way,
or shifting all of the parameters to the left, unaligned with
respect to the function parentheses.
Bug: glibc <https://sourceware.org/bugzilla/show_bug.cgi?id=27529>
Cc: G. Branden Robinson <g.branden.robinson@gmail.com>
Cc: glibc <libc-alpha@sourceware.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in posix_spawn().
Let's use it here too.
.../glibc$ grep_glibc_prototype posix_spawn
posix/spawn.h:72:
extern int posix_spawn (pid_t *__restrict __pid,
const char *__restrict __path,
const posix_spawn_file_actions_t *__restrict
__file_actions,
const posix_spawnattr_t *__restrict __attrp,
char *const __argv[__restrict_arr],
char *const __envp[__restrict_arr])
__nonnull ((2, 5));
.../glibc$
I conciously did an exception with respect to the right margin
of the rendered page. Instead of having the right margin at 78
as usual (per Branden's recommendation), I let it use col 79
this time, to avoid breaking the prototype in an ugly way,
or shifting all of the parameters to the left, unaligned with
respect to the function parentheses.
Cc: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in mq_setattr().
Let's use it here too.
.../glibc$ grep_glibc_prototype mq_setattr
rt/mqueue.h:51:
extern int mq_setattr (mqd_t __mqdes,
const struct mq_attr *__restrict __mqstat,
struct mq_attr *__restrict __omqstat)
__THROW __nonnull ((2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in mbtowc().
Let's use it here too.
.../glibc$ grep_glibc_prototype mbtowc
stdlib/stdlib.h:925:
extern int mbtowc (wchar_t *__restrict __pwc,
const char *__restrict __s, size_t __n) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in mbrlen().
Let's use it here too.
.../glibc$ grep_glibc_prototype mbrlen
wcsmbs/wchar.h:307:
extern size_t mbrlen (const char *__restrict __s, size_t __n,
mbstate_t *__restrict __ps) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX.1-2001 and glibc use 'restrict' in swapcontext().
Let's use it here too.
.../glibc$ grep_glibc_prototype swapcontext
stdlib/ucontext.h:41:
extern int swapcontext (ucontext_t *__restrict __oucp,
const ucontext_t *__restrict __ucp)
__THROWNL __INDIRECT_RETURN;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in lio_listio().
However, POSIX is a bit more restrictive than glibc
for the second parameter.
Let's document the more restrictive POSIX variant.
$ man 3p lio_listio |sed -n '/^SYNOPSIS/,/;/p'
SYNOPSIS
#include <aio.h>
int lio_listio(int mode, struct aiocb *restrict const list[restrict],
int nent, struct sigevent *restrict sig);
$
.../glibc$ grep_glibc_prototype lio_listio
rt/aio.h:148:
extern int lio_listio (int __mode,
struct aiocb *const __list[__restrict_arr],
int __nent, struct sigevent *__restrict __sig)
__THROW __nonnull ((2));
.../glibc$
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Joseph S. Myers" <joseph@codesourcery.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: glibc <libc-alpha@sourceware.org>
Bug: glibc <https://sourceware.org/bugzilla/show_bug.cgi?id=16747>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in inet_pton().
Let's use it here too.
.../glibc$ grep_glibc_prototype inet_pton
inet/arpa/inet.h:58:
extern int inet_pton (int __af, const char *__restrict __cp,
void *__restrict __buf) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in glob().
Let's use it here too.
.../glibc$ grep_glibc_prototype glob
posix/glob.h:146:
extern int glob (const char *__restrict __pattern, int __flags,
int (*__errfunc) (const char *, int),
glob_t *__restrict __pglob) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in getnameinfo().
Let's use it here too.
I consciously did an exception with respect to the right margin
of the rendered page. Instead of having the right margin at 78
as usual (per Branden's recommendation), I let it use col 79
this time, to avoid breaking the prototype in an ugly way.
.../glibc$ grep_glibc_prototype getnameinfo
resolv/netdb.h:675:
extern int getnameinfo (const struct sockaddr *__restrict __sa,
socklen_t __salen, char *__restrict __host,
socklen_t __hostlen, char *__restrict __serv,
socklen_t __servlen, int __flags);
.../glibc$
Cc: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX does NOT specify these functions to use 'restrict'.
However, glibc uses 'restrict' in getgrnam_r(), getgrgid_r().
Users might be surprised by this! Let's use it here too!
.../glibc$ grep_glibc_prototype getgrnam_r
grp/grp.h:148:
extern int getgrnam_r (const char *__restrict __name,
struct group *__restrict __resultbuf,
char *__restrict __buffer, size_t __buflen,
struct group **__restrict __result);
.../glibc$ grep_glibc_prototype getgrgid_r
grp/grp.h:140:
extern int getgrgid_r (__gid_t __gid, struct group *__restrict __resultbuf,
char *__restrict __buffer, size_t __buflen,
struct group **__restrict __result);
.../glibc$
Cc: glibc <libc-alpha@sourceware.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fgetpos().
Let's use it here too.
glibc:
============================= fgetpos
libio/stdio.h:736:
int fgetpos (FILE *restrict stream, fpos_t *restrict pos);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in these functions.
Let's use it here too.
glibc:
============================= fread
libio/stdio.h:651:
size_t fread (void *restrict ptr, size_t size,
size_t n, FILE *restrict stream) wur;
============================= fwrite
libio/stdio.h:657:
size_t fwrite (const void *restrict ptr, size_t size,
size_t n, FILE *restrict s);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fputws().
Let's use it here too.
glibc:
============================= fputws
wcsmbs/wchar.h:765:
int fputws (const wchar_t *restrict ws,
FILE *restrict stream);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' for fgetws().
Let's use it here too.
glibc:
wcsmbs/wchar.h:758:
wchar_t *fgetws (wchar_t *restrict ws, int n,
FILE *restrict stream);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fgets().
Let's use it here too.
glibc:
libio/stdio.h:568:
char *fgets (char *restrict s, int n, FILE *restrict stream)
wur attr_access ((write_only__, 1, 2));
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX.1-2001 and glibc use 'restrict' for these functions.
Let's use it here too.
glibc:
============================= ecvt
stdlib/stdlib.h:872:
char *ecvt (double value, int ndigit, int *restrict decpt,
int *restrict sign) THROW nonnull ((3, 4)) wur;
============================= fcvt
stdlib/stdlib.h:878:
char *fcvt (double value, int ndigit, int *restrict decpt,
int *restrict sign) THROW nonnull ((3, 4)) wur;
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses 'restrict' for some of the functions in this page:
============================= drand48_r
stdlib/stdlib.h:501:
int drand48_r (struct drand48_data *restrict buffer,
double *restrict result) THROW nonnull ((1, 2));
============================= erand48_r
stdlib/stdlib.h:503:
int erand48_r (unsigned short int xsubi[3],
struct drand48_data *restrict buffer,
double *restrict result) THROW nonnull ((1, 2));
============================= lrand48_r
stdlib/stdlib.h:508:
int lrand48_r (struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= nrand48_r
stdlib/stdlib.h:511:
int nrand48_r (unsigned short int xsubi[3],
struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= mrand48_r
stdlib/stdlib.h:517:
int mrand48_r (struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= jrand48_r
stdlib/stdlib.h:520:
int jrand48_r (unsigned short int xsubi[3],
struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= srand48_r
stdlib/stdlib.h:526:
int srand48_r (long int seedval, struct drand48_data *buffer)
THROW nonnull ((2));
============================= seed48_r
stdlib/stdlib.h:529:
int seed48_r (unsigned short int seed16v[3],
struct drand48_data *buffer) THROW nonnull ((1, 2));
============================= lcong48_r
stdlib/stdlib.h:532:
int lcong48_r (unsigned short int param[7],
struct drand48_data *buffer)
THROW nonnull ((1, 2));
Let's use it here too.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of memcpy()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
It's especially important in memcpy(),
as it's been a historical source of bugs.
......
.../glibc$ grep_glibc_prototype memcpy
posix/regex_internal.h:746:
{
memcpy (dest, src, sizeof (bitset_t));
string/string.h:43:
extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
size_t __n) __THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Epoch is 1970-01-01 00:00:00 +0000, UTC (see time(7)).
Reported-by: Walter Franzini <walter.franzini@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both functions have the same header.
There's no reason to separate the prototypes repeating the header.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
RESOLVE_CACHED allows an application to attempt a cache-only open
of a file. If this isn't possible, the request will fail with
-1/EAGAIN and the caller should retry without RESOLVE_CACHED set.
This will generally happen from a different context, where a slower
open operation can be performed.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make it clear that netlink error responses (i.e., messages with
type NLMSG_ERROR (0x2)), can be longer than sizeof(struct
nlmsgerr). In certain circumstances, the payload can be longer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
That file should be sourced (.) from 'bashrc' (or 'bash_aliases').
It contains functions that are useful for the maintenance of this
project.
- grep_syscall()
- grep_syscall_def()
- man_section()
- man_lsfunc()
- pdfman()
- grep_glibc_prototype()
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX does NOT specify aio_suspend() to use 'restrict'.
However, glibc uses 'restrict'.
Users might be surprised by this! Let's use it here too!
......
.../glibc$ grep_glibc_prototype aio_suspend
rt/aio.h:167:
extern int aio_suspend (const struct aiocb *const __list[], int __nent,
const struct timespec *__restrict __timeout)
__nonnull ((1));
.../glibc$
Cc: libc-alpha@sourceware.org
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that [sig]longjmp() shall not return,
transferring control back to the caller of [sig]setjmp().
Glibc uses __attribute__((__noreturn__)) for [sig]longjmp().
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that pthread_exit() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that exit() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses __attribute__((__noreturn__)) for [v]err[x]().
These functions never return.
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that _exit() and _Exit() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that abort() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Switching into the man? subdirectories when running man2html(1)
caused a bug where ".so dir/page.n" links were misinterpreted
(because the directory prefix was interpreted with respect to
the current directory)i, and consequently, the link files
were not correctly rendered. There's no need to switch into the
subdirectories.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This page uses some idiosyncratic mark-up involving the use of
a groff register. The mark-up actually makes no difference to
the formatted result, but does cause man2html(1) to emit error
messages, since it does not understand the mark-up. So, remove
that mark-up.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use the glibc prototypes instead of the kernel ones.
Exception: use 'int' instead of 'enum'.
......
.../glibc$ grep_glibc_prototype pciconfig_read
sysdeps/unix/sysv/linux/alpha/sys/io.h:72:
extern int pciconfig_read (unsigned long int __bus,
unsigned long int __dfn,
unsigned long int __off,
unsigned long int __len,
unsigned char *__buf) __THROW;
sysdeps/unix/sysv/linux/ia64/sys/io.h:57:
extern int pciconfig_read (unsigned long int __bus, unsigned long int __dfn,
unsigned long int __off, unsigned long int __len,
unsigned char *__buf);
.../glibc$ grep_glibc_prototype pciconfig_write
sysdeps/unix/sysv/linux/alpha/sys/io.h:78:
extern int pciconfig_write (unsigned long int __bus,
unsigned long int __dfn,
unsigned long int __off,
unsigned long int __len,
unsigned char *__buf) __THROW;
sysdeps/unix/sysv/linux/ia64/sys/io.h:61:
extern int pciconfig_write (unsigned long int __bus, unsigned long int __dfn,
unsigned long int __off, unsigned long int __len,
unsigned char *__buf);
.../glibc$ grep_glibc_prototype pciconfig_iobase
sysdeps/unix/sysv/linux/alpha/sys/io.h:66:
extern long pciconfig_iobase(enum __pciconfig_iobase_which __which,
unsigned long int __bus,
unsigned long int __dfn)
__THROW __attribute__ ((const));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
All but the last parameters of t[g]kill() use 'pid_t',
both in the kernel and glibc. Fix them.
......
.../linux/linux$ grep_syscall tkill
kernel/signal.c:3870:
SYSCALL_DEFINE2(tkill, pid_t, pid, int, sig)
include/linux/syscalls.h:685:
asmlinkage long sys_tkill(pid_t pid, int sig);
.../linux/linux$
.../gnu/glibc$ grep_glibc_prototype tgkill
sysdeps/unix/sysv/linux/bits/signal_ext.h:29:
extern int tgkill (__pid_t __tgid, __pid_t __tid, int __signal);
.../gnu/glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The kernel syscall uses 'loff_t', but the glibc wrapper uses 'off64_t'.
Let's document the wrapper prototype, as in other pages.
......
.../glibc$ grep_glibc_prototype splice
sysdeps/unix/sysv/linux/bits/fcntl-linux.h:398:
extern __ssize_t splice (int __fdin, __off64_t *__offin, int __fdout,
__off64_t *__offout, size_t __len,
unsigned int __flags);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type of fsgid is git_t, and not uid_t. Fix it.
......
.../glibc$ grep_glibc_prototype setfsgid
sysdeps/unix/sysv/linux/sys/fsuid.h:31:
extern int setfsgid (__gid_t __gid) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
"Mibibytes" is a misspelling of "mebibytes",
but let's use more familiar "MiB" instead.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I just happened upon this inconsistent text while reading `man 2
execve`. The code in question landed in 2.6.23 as b6a2fea39318
("mm: variable length argument support").
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of timer_settime()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype timer_settime
time/time.h:242:
extern int timer_settime (timer_t __timerid, int __flags,
const struct itimerspec *__restrict __value,
struct itimerspec *__restrict __ovalue) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses 'restrict' for the types of the parameters of statx().
Let's use it here too.
......
.../glibc$ grep_glibc_prototype statx
io/bits/statx-generic.h:60:
int statx (int __dirfd, const char *__restrict __path, int __flags,
unsigned int __mask, struct statx *__restrict __buf)
__THROW __nonnull ((2, 5));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of sigaltstack()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype sigaltstack
signal/signal.h:320:
extern int sigaltstack (const stack_t *__restrict __ss,
stack_t *__restrict __oss) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of getsockopt()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype getsockopt
socket/sys/socket.h:208:
extern int getsockopt (int __fd, int __level, int __optname,
void *__restrict __optval,
socklen_t *__restrict __optlen) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of getpeername()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype getpeername
socket/sys/socket.h:130:
extern int getpeername (int __fd, __SOCKADDR_ARG __addr,
socklen_t *__restrict __len) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page used 'hint' and 'advice' synonymously. This leaves the
reader wondering if the terms mean the same thing, or different
things. They mean the same thing, so use just one term.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than repeating the description of MADV_COLD and MADV_PAGEOUT
in two pages, centralize the discussion in madvise(2), and refer
from process_madvise(2) ro madvise(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Skimming open(2), I was surprised not to see tmpfs mentioned as a
filesystem supported by O_TMPFILE.
If I'm understanding correctly (I'm very possibly not!), tmpfs is
a filesystem built on shmem, so I think it's more correct (and
probably much more widely understandable) to refer to tmpfs here.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are many slightly different prototypes for this syscall,
but none of them is like the documented one.
Of all the different prototypes,
let's document the asm-generic one.
This manual page was actually using a prototype similar to
mmap(2), but there's no glibc wrapper function called mmap2(2),
as the wrapper for this syscall is mmap(2). Therefore, the
documented prototype should be the kernel one.
......
.../linux$ grep_syscall mmap2
arch/csky/kernel/syscall.c:17:
SYSCALL_DEFINE6(mmap2,
unsigned long, addr,
unsigned long, len,
unsigned long, prot,
unsigned long, flags,
unsigned long, fd,
off_t, offset)
arch/microblaze/kernel/sys_microblaze.c:46:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long, fd,
unsigned long, pgoff)
arch/nds32/kernel/sys_nds32.c:12:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
arch/powerpc/kernel/syscalls.c:60:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
arch/riscv/kernel/sys_riscv.c:37:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, off_t, offset)
arch/s390/kernel/sys_s390.c:49:
SYSCALL_DEFINE1(mmap2, struct s390_mmap_arg_struct __user *, arg)
arch/sparc/kernel/sys_sparc_32.c:101:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long, fd,
unsigned long, pgoff)
arch/ia64/include/asm/unistd.h:30:
asmlinkage unsigned long sys_mmap2(
unsigned long addr, unsigned long len,
int prot, int flags,
int fd, long pgoff);
arch/ia64/kernel/sys_ia64.c:139:
asmlinkage unsigned long
sys_mmap2 (unsigned long addr, unsigned long len, int prot, int flags, int fd, long pgoff)
arch/m68k/kernel/sys_m68k.c:40:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
arch/parisc/kernel/sys_parisc.c:275:
asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags, unsigned long fd,
unsigned long pgoff)
arch/powerpc/include/asm/syscalls.h:15:
asmlinkage long sys_mmap2(unsigned long addr, size_t len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
arch/sh/include/asm/syscalls.h:8:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
arch/sh/kernel/sys_sh.c:41:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
arch/sparc/kernel/systbls.h:23:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
include/asm-generic/syscalls.h:14:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
.../linux$
function grep_syscall()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <syscall>";
return ${EX_USAGE};
fi
find * -type f \
|grep '\.c$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^\w*SYSCALL_DEFINE.\(${1},.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
find * -type f \
|grep '\.[ch]$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^asmlinkage\s+[\w\s]+\**sys_${1}\s*\(.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
}
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The documented prototype for mlock2() was a mix of the
glibc wrapper prototype and the kernel syscall prototype.
Let's document the glibc wrapper prototype, which is shown below.
......
.../glibc$ grep_glibc_prototype mlock2
sysdeps/unix/sysv/linux/bits/mman-shared.h:55:
int mlock2 (const void *__addr, size_t __length, unsigned int __flags) __THROW;
.../glibc$
function grep_glibc_prototype()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <func>";
return ${EX_USAGE};
fi
find * -type f \
|grep '\.h$' \
|sort -V \
|xargs pcregrep -Mn \
"(?s)^[^\s#][\w\s]+\s+\**${1}\s*\([\w\s()[\]*,]*?(...)?\)[\w\s()]*;" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
}
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There seems to be no reason <unistd.h> is shown here, so remove it.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These pages have the odd wording 'the external variable errno',
which does not occur in other pages. Make these pages conform with
the norm.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fileno(3) differs from the other functions in various ways.
For example, it is governed by different standards,
and can set 'errno'. Conversely, the other functions
are about examining the status of a stream, while
fileno(3) simply obtains the underlying file descriptor.
Furthermore, splitting this function out allows
for some cleaner upcoming changes in ferror(3).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Calling ipc() directly would be a rather unusual thing to do,
so add some text to emphasize that point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I filed a bug against glibc
requesting the wrapper for the new syscall.
Glibc bug: <https://sourceware.org/bugzilla/show_bug.cgi?id=27359>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The 'advice' subsection fell in the middle of other text in the
DESCRIPTION, which is a little confusing. Instead, move that
subsection to the end of the DESCRIPTION, and make some other
minor text reorganization so that related details are placed in
the same paragraphs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'malloc_trim' was and is never called from the 'free' function.
see related bug in glibc tracker:
https://sourceware.org/bugzilla/show_bug.cgi?id=2531. or
'__int_free' function. Only the top part of the heap is trimmed
after some calls to 'free', which is different from 'malloc_trim'
which also releases memory in between chunks from all the
arenas/heaps.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Move the the text describing how to set environment variable before
the list(s) of variables in order to improve readability.
[mtk: rewrote commit message]
Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clearly document that HOME, LOGNAME, SHELL and USER are set at
login time by a program like such as login(1).
Document also that using su could result in a mixed environment,
and point to the su(1) manual page.
[mtk: edited commit message]
Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add more details of how PATH is used, and mention the legacy
use of an empty prefix.
Changed after a suggested patch by Bastien Roucariès.
Reported-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Unlike SIOCGIFADDR and SIOCSIFADDR which are supported by many
protocol families, SIOCDIFADDR is supported by AF_INET6 and
AF_APPLETALK only.
Unlike other protocols, AF_INET6 uses struct in6_ifreq.
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the name=value system and that nul byte is forbidden.
Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Say "execve(2)" instead of "exec(3)", and note that this step
starts a new program (not a new process!).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using _GNU_SOURCE to obtain the declaration of 'environ' is
nonstandard. Therefore, move the mention of this detail to NOTES.
At the same time, add a few words proposed by Bastien.
Cowritten-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In man-pages-5.11, a large number of pages were edited to achieve
greater consistency in the SYNOPIS, RETURN VALUE and ATTRIBUTES
sections. To avoid future inconsistencies, try to capture some of
the preferred conventions in text in man-pages(7).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The msg_name field for recvmsg() call points to a caller-allocated buffer
nladdr that is used to return the source address of the (netlink) socket.
As recvmsg() does not read this buffer and fills it for a caller, do not
initialize it and instead check its value in the example.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Drop the reference to the Hacker Writing Guide (and the broken URL)
and simply note that the logical quoting style is the norm in
European languages also.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A real minus can be cut and pasted...
THere are a few exceptions that gave been excluded in the this
change. For example, where there' is a string such as "<p1-name>",
where p1-name is soome sort of pseudo-identifier.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It works both way, but this one feels more right. We are reading
four elements sizeof(*buffer) bytes each.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified from reading the kernel source and looking at the source
of mount(8). Surprisingly, this has not documented after so many
years.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
mtk: Enke later noted that this patch provides better documentation
of longstanding behavior (rather documenting a change in behavior).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The environ(7) man page says:
SHELL The pathname of the user's login shell.
PAGER The user's preferred utility to display text files.
EDITOR/VISUAL
The user's preferred utility to edit text files.
but doesn't say whether the pathnames must be absolute or they can
be resolved using $PATH, or whether they can have options.
Note that at least for SHELL, this is not specified by POSIX.
This issue was raised in the Austin Group mailing-list, and the
answer is that "what constitutes a valid value for a platform
should be documented" [1].
Since OpenSSH assumes that $SHELL is an absolute pathname (when
set), it is supposed that the documentation should be:
SHELL The absolute pathname of the user's login shell.
For PAGER, POSIX says: "Any string acceptable as a command_string
operand to the sh -c command shall be valid."
For EDITOR, it does not need to be an absolute pathname since
POSIX gives the example:
EDITOR=vi fc
and since it is specified as "the name of a utility", It assumes
that arguments (options) must not be provided. Page 3013 about
"more", it is said: "If the last pathname component in EDITOR is
either vi or ex, [...]", thus again, it is assumed to be a
pathname.
For VISUAL, POSIX says: "Determine a pathname of a utility to
invoke when the visual command [...]", thus it is also a pathname.
It is not clear whether the pathname must be absolute, but for
consistency with EDITOR, it will be resolved using $PATH.
[1] https://www.mail-archive.com/austin-group-l@opengroup.org/msg01399.html
Reported-by: Vincent Lefevre <vincent@vinc17.net>
Signed-off-by: Bastien Roucaries <rouca@debian.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When I inherited man-pages in 2004, it was a hodge-podge mix of
American vs British spelling. My native spelling is the latter,
but I value consistency and felt that things needed to be
standardized on one or other, and in computing, American is the
norm so that is what I settled on.
Among the changes was the substitution of various instances
of "acknowledgement" for "acknowledgment". The latter spelling is
not one I care for, but I believed it to be the American norm.
Alex Colomar proposed a patch to change the spelling back
to "acknowledgement", and some discussion and investigation
ensued, whereby I learned the following:
* The situation is not clear cut.
* Historically, "acknowledgment" was the norm in British English,
but was eclipsed by "acknowledgement" some decades ago.
* Even in American English, "acknowledgment" is not universal,
and "acknowledgement" has become more common in recent decades
(although it still remains minority usage) [2].
* The BSD license uses "acknowledgement" even though it was
(presumably) written in California.
* The POSIX standard uses "acknowledgement".
* The Debian BTS uses "acknowledgement".
* Looking at a corpus of manual pages from various systems
that I have assembled over the years, "acknowledgement" seems
a little more common than "acknowledgment".
Summary: the situation is not clear cut, but let's follow BSD,
POSIX, and the personal preference of the man-pages maintainers.
[1] https://books.google.com/ngrams/graph?content=acknowledgment%2Cacknowledgement&year_start=1800&year_end=2019&corpus=29&smoothing=3#
[2] https://books.google.com/ngrams/graph?content=acknowledgment%2Cacknowledgement&year_start=1800&year_end=2000&corpus=5&smoothing=3&
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For the alternate signal stack to be cleared, CLONE_VM should and
CLONE_VFORK should not be specified.
[mtk: fixes my commit 52e5819c41]
Signed-off-by: Johannes Wellhöfer <johannes.wellhofer@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
man-pages bug: 211029
https://bugzilla.kernel.org/show_bug.cgi?id=211029
Complete workaround example
(it was too long for the page, but it may be useful here):
......
$ sudo ln -s -T /usr/bin/echo /usr/bin/-echo;
$ cc -o system_hyphen -x c - ;
#include <stdlib.h>
int
main(void)
{
system(" -echo Hello world!");
exit(EXIT_SUCCESS);
}
$ ./system_hyphen;
Hello world!
Reported-by: Ciprian Dorin Craciun <ciprian.craciun@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
NETLINK_CAP_ACK option was introduced in commit 0a6a3a23ea6e which first
appeared in Linux version 4.3 and not 4.2.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
rtnetlink is not only used for IPv4
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The parentheses here make it look like a function rather than a
command.
This was a typo introduced by a script-assisted global edit.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's been a long time sine kernel 3.19.
There's still no glibc wrapper.
......
$ grep -rn 'execveat *(' glibc/
$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses 'void *' instead of 'char *'.
And the prototype is declared in <sys/cachectl.h>.
......
$ syscall='cacheflush';
$ ret='int';
$ find glibc/ -type f -name '*.h' \
|xargs pcregrep -Mn "(?s)^[\w\s]*${ret}\s*${syscall}\s*\(.*?;";
glibc/sysdeps/unix/sysv/linux/nios2/sys/cachectl.h:27:
extern int cacheflush (void *__addr, const int __nbytes, const int __op) __THROW;
glibc/sysdeps/unix/sysv/linux/mips/sys/cachectl.h:35:
extern int cacheflush (void *__addr, const int __nbytes, const int __op) __THROW;
glibc/sysdeps/unix/sysv/linux/arc/sys/cachectl.h:30:
extern int cacheflush (void *__addr, int __nbytes, int __op) __THROW;
glibc/sysdeps/unix/sysv/linux/csky/sys/cachectl.h:30:
extern int cacheflush (void *__addr, const int __nbytes,
const int __op) __THROW;
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Expand the epoll_wait() page with epoll_pwait2(), an epoll_wait()
variant that takes a struct timespec to enable nanosecond
resolution timeout.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A file descriptor is an int so it should be stored through an int
pointer while parent_tid should have the same type as child_tid
which is pid_t pointer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A more detailed notice is on realloc(3p).
......
$ man 3p realloc \
|sed -n \
-e '/APPLICATION USAGE/,/^$/p' \
-e '/FUTURE DIRECTIONS/,/^$/p';
APPLICATION USAGE
The description of realloc() has been modified from pre‐
vious versions of this standard to align with the
ISO/IEC 9899:1999 standard. Previous versions explicitly
permitted a call to realloc(p, 0) to free the space
pointed to by p and return a null pointer. While this be‐
havior could be interpreted as permitted by this version
of the standard, the C language committee have indicated
that this interpretation is incorrect. Applications
should assume that if realloc() returns a null pointer,
the space pointed to by p has not been freed. Since this
could lead to double-frees, implementations should also
set errno if a null pointer actually indicates a failure,
and applications should only free the space if errno was
changed.
FUTURE DIRECTIONS
This standard defers to the ISO C standard. While that
standard currently has language that might permit real‐
loc(p, 0), where p is not a null pointer, to free p while
still returning a null pointer, the committee responsible
for that standard is considering clarifying the language
to explicitly prohibit that alternative.
Bug: 211039 <https://bugzilla.kernel.org/show_bug.cgi?id=211039>
Reported-by: Johannes Pfister <johannes.pfister@josttech.ch>
Cc: libc-alpha@sourceware.org
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
.PP are redundant just after .SH or .SS.
Remove them.
$ find man? -type f \
|xargs sed -i '/^\.S[HS]/{n;/\.PP/d}';
Plus a couple manual edits.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is implied in every other manual page. There is no need to
state it explicitly in these pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a glitch in commit ff91beca5b.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
.PP and .IP are redundant just before .SH or .SS.
Remove them.
$ find man? -type f \
|xargs sed -i '/^\.[IP]P$/{N;s/.*\n\(\.S[HS]\)/\1/}';
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit a280d6dc77eb
("ipc/sem: introduce semctl(SEM_STAT_ANY)"),
it only skips read access check when using SEM_STAT_ANY command.
And it should use the semid_ds struct instead of seminfo struct.
Fix this.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, miscellaneous descriptions of the setting
of 'errno' are reworded to the norm of "is set to indicate
the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, reword various cases saying that 'errno' is set
"appropriately" to "is set to indicate the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, fix some instances stating that 'errno' is set
"appropriately" to instead say "to indicate the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, "to indicate the cause of the error"
is changed to "to indicate the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current mark-up renders poorly. To resolve this, move
the type information into a separate line.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since we are using .nf/.fi to bracket FTM info, escaping
space characters serves no space and clutters the source.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Group macros by kinds.
- Align so that it's easiest to distinguish differences
between related macros.
(Align all continuations for consistency on PDF.)
- Fix minor typos.
- Remove redundant text:
'The macro xxx() ...':
The first paragraph already says that these are macros.
'circular|tail|... queue':
Don't need to repeat every time.
Generic text makes it easier to spot the differences.
- Fit lines into 78 columns.
- Reorder descriptions to match SYNOPSIS,
and add subsections to DESCRIPTION.
- srcfix: fix a few semantic newlines.
I noticed a bug which should be fixed next:
CIRCLEQ_LOOP_*() return a 'struct TYPE *'.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change `RES_USE_EDNSO` to `RES_USE_EDNS0`, defined in
`resolv.h`. (This is written correctly in `man3/resolver.3` in this
same repo.) Helps with grepping and internet searches!
Signed-off-by: John Morris <john@zultron.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Various ATTRIBUTES table improvements following the previous
commit. In particular, make use of T{...T} to allow wrapping
in table cells that have a lot of text.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the formatting more consistent inside the tables in the
ATTRIBUTES sections. Make the source code more uniform; in
particular, eliminate the use of custom tweaks using
'lbwNN'/'lwNN' and .br macros. In addition, ensure that
hyphenation and text justification do not occur inside the tables.
This is a script-driven edit:
[[
PAGE_LIST=$(git grep -l 'SH ATTRIBUTES' man[23])
# Strip out any preexisting .sp/.br/.ad macros
sed -i '/SH ATTR/,/^\.SH/{/^\.sp/d; /^\.br/d; /\.ad/d}' $PAGE_LIST
# Eliminate any use of 'wNN' in tables; default first column
# to fill unused space
sed -i '/SH ATTR/,/^\.SH/s/lbw[0-9]*/lb/g' $PAGE_LIST
sed -i '/SH ATTR/,/^\.SH/s/lw[0-9]*/l/g' $PAGE_LIST
sed -i '/SH ATTR/,/^\.SH/s/^lb /lbx /' $PAGE_LIST
# Nest the tables inside ".ad l"+".nh" and ".hy"+".ad"+".sp 1"
# ".ad l" ==> no right justification of text in table cells
# ".nh" ==> No hyphenation in table cells
# ".sp 1" ==> ensure a blank line before the next section heading
sed -i '/SH ATTR/,/^\.SH/{/\.TS/i.ad l\n.nh
}' $PAGE_LIST
sed -i '/SH ATTR/,/^\.SH/{/\.TE/a.hy\n.ad\n.sp 1
}' $PAGE_LIST
# In a few of the tables, the third column has a lot of text, so
# make that column wide (rather than the first column)
sed -i '/^lbx/{s/lbx/lb/;s/lb$/lbx/}' \
man3/bindresvport.3 \
man3/fmtmsg.3 man3/gethostbyname.3 man3/getlogin.3 \
man3/getnetent.3 man3/getprotoent.3 man3/getpwent.3 \
man3/getservent.3 man3/getspnam.3 man3/getutent.3 man3/glob.3 \
man3/login.3 \
man3/setnetgrent.3 \
man3/wordexp.3
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the SYNOPSIS, a long function prototype may need to be
continued over to the next line. The continuation line is
indented according to the following rules:
1. If there is a single such prototype that needs to be continued,
then align the continuation line so that when the page is
rendered on a fixed-width font device (e.g., on an xterm) the
continuation line starts just below the start of the argument
list in the line above. (Exception: the indentation may be
adjusted if necessary to prevent a very long continuation line
or a further continuation line where the function prototype is
very long.)
Thus:
int tcsetattr(int fd, int optional_actions,
const struct termios *termios_p);
2. But, where multiple functions in the SYNOPSIS require
continuation lines, and the function names have different
lengths, then align all continuation lines to start in the
same column. This provides a nicer rendering in PDF output
(because the SYNOPSIS uses a variable width font where
spaces render narrower than most characters).
Thus:
int getopt(int argc, char * const argv[],
const char *optstring);
int getopt_long(int argc, char * const argv[],
const char *optstring,
const struct option *longopts, int *longindex);
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since _BSD_SOURCE is obsolete for quite some time now,
it should not be listed as the first FTM for lstat().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Generally, place '||' at start of a line, rather than the end of
the previous line.
Rationale: this placement clearly indicates that that each piece
is an alternative.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Different source styles are used in different pages to achieve the
same formatted output, and in some cases the source mark-up is a
rather convoluted combination of .RS/.RE/.TP/.PD macros. Simplify
this greatly, and unify all of the pages to use more or less the
same source code style. This makes the source code rather easier
to read, and may simplify future scripted global changes.
The feature test macro info is currently bracketed by .nf/.fi
pairs. This is not strictly necessary (i.e., it makes no
difference to the rendered output), but for the moment we keep
these "brackets" in case they may be replaced with something else.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This rather ancient FTM is not mentioned in other pages for
reasons discussed in feature_test_macros(7). Remove this FTM
from the three pages where it does appear.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The majority of pages use .nf/.fi in SYNOPSIS, but there are
still many that don't and use .br to achieve newlines. Fix many
of those. This brings greater consistency to the pages, which
eases editing and may ease future scripted edits to the pages.
Many of these changes were script-assisted, with some additional
manual edits.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Yet more clean-ups after commit
15d6565317.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.29, there is a wrapper for getcpu(2).
The wrapper has only 2 arguments, omitting the unused
third system call argument. Rework the manual page
to reflect this.
Reported-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Most pages use 'unsigned int' (and the kernel too).
Make them all do so.
$ find man? -type f \
| xargs sed -i \
-e 's/unsigned \*/unsigned int */g'
-e 's/unsigned "/unsigned int "/g';
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The use of vertical white space in the SYNOPSIS sections
is rather inconsistent. Make it more consistent, subject to the
following heuristics:
* Prefer no blank lines between function signatures by default.
* Where many functions are defined in the SYNOPSIS, add blank
lines where needed to improve readability, possibly by using
blank lines to separate logical groups of functions.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The functions are all declared in <netdb.h>. <sys/socket.h> is only
needed for the AF_* constants.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Bring a bit more consistency to Feature Test Macro information
(mainly .PP between differnt FTM lists).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use .PP (which gives a bit of vertical white space) rather than
.br to separate functions in Feature Test Macro requirement lists.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses 'unsigned int' instead of 'int' for the
'flags' parameter. As glibc provides no wrapper, use the same
type the kernel uses.
......
$ syscall='delete_module';
$ find linux/ -type f -name '*.c' \
|xargs pcregrep -Mn "(?s)^[\w_]*SYSCALL_DEFINE.\(${syscall},.*?\)";
linux/kernel/module.c:977:
SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
unsigned int, flags)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See previous commit.
This commit normalizes texts under sections other than SYNOPSIS
(most of them in NOTES).
Signed-off-by: Ganimedes Colomar <gacoan.linux@gmail.com>
Cowritten-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
To easily distinguish documentation about glibc wrappers from
documentation about kernel syscalls, let's have a normalized
'Note' in the SYNOPSIS, and a further explanation in the page body
(NOTES in most of them), as already happened in many (but not all)
of the manual pages for syscalls without a wrapper. Furthermore,
let's normalize the messages, following membarrier.2 (because it's
already quite extended), so that it's easy to use grep to find
those pages.
To find these pages, we used:
$ grep -rn wrapper man? | sort -V
and
$ grep -rni support.*glibc | sort -V
delete_module.2, init_module.2: glibc 2.23 is no longer
maintained, so we changed the notes about wrappers, to say that
there are no glibc wrappers for these system calls; see NOTES.
We didn't fix some obsolete pages such as create_module.2.
Signed-off-by: Ganimedes Colomar <gacoan.linux@gmail.com>
Cowritten-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rationale:
$ man 7 man-pages 2>/dev/null | sed -n /Paragraphs/,/^$/p
Paragraphs should be separated by suitable markers (usually
either .PP or .IP). Do not separate paragraphs using blank
lines, as this results in poor rendering in some output
formats (such as PostScript and PDF).
Fix:
$ sed -i -e '1,/^\.EX/s/^$/.PP/' -e '/^\.EE/,/^\.EX/s/^$/.PP/' man?/*
And then some manual adjustments.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I noticed this while working on some silly "hello, world"
programs, see https://git.sr.ht/~phf/hello-again if you're
curious. Disassembling sh4 code showed trap #31 all over the
place but the syscall(2) man page talked about trap #0x17 and
friends. Checking the kernel sources I got lucky in
arch/sh/kernel/entry-common.S where in commit 3623d138213ae Rich
Felker clarifies the situation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Martin Sebor <msebor@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Files moved from .txt to .rst.
Also, drop / prefix from kernel source tree references.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The main point I was driving at with this patch was to fix
"Microsoft Window's FAT filesystems" (i.e., FAT filesystems which
belong to Microsoft Windows, which is decidedly wrong).
FAT32 first shipped with MS-DOS 7.1, as part of Windows 95 OSR2,
but it's a (relatively) simple logical extension of the previous
FATx filesystems (16 and 12 as we know and love them today, I
don't think the PC ever saw 8), hence the "VFAT" driver name ‒
calling FAT-anything a Windows filesystem would be a flat-out lie,
calling it a Microsoft filesystem would be, uh, facetious.
NTFS (as part of Windows NT), on the other hand, is wholly
different WRT the scope and feature-set (it does borrow some
layouting from FAT, but reading NTFS as FAT doesn't get you very
far, or much).
The replacing bit is also questionable, especially in a.d. 2020:
while it is true that you cannot install NT on FAT (after a
certain point? my memory ain't what it used to be), and must
therefore replace your existing FAT partitions with NTFS during
upgrades; Windows NT 4.0, the last product to be NT-branded came
out in 1996, i.e. you could not install Windows on FAT (and,
therefore, upgrade it to NTFS, replacing it) during my entire
lifetime.
Indeed, in $(date +%Y) we live in a post-NTFS world ‒ putting NTFS
in the same class as FAT beyond "is a filesystem" is a joke.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Relevant Linux commits:
* moved to staging in 1bb8155080c652c4853e6228f8f0d262b3049699
(describe: v4.15-rc1-129-g1bb8155080c6) in Nov 2017,
described as "broken" and "obsolete"
* purged in bd32895c750bcd2b511bf93917bf7ae723e3d0b6
(describe: v4.17-rc3-1010-gbd32895c750b) in Jun 2018,
"since no one has complained or even noticed it was gone"
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since Linux kernel 3.12, tcp_syncookies can have the value 2,
which sends out cookies unconditionally.
Related kernel commits:
5ad37d5deee1ff7150a2d0602370101de158ad86
d8513df2598e5142f8a5c4724f28411936e1dfc7
Reported-by: Philip Rowlands <linux-kernel@dimebar.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Heinrich:
The strnlen.3 manpage has the following sentence:
"In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by s and never beyond
s+maxlen."
This sentence is self-contradictory:
The last visited character implied by "first maxlen
characters" is s[maxlen-1].
Given that "beyond a" does not include "a", the last visited
character implied by "never beyond s+maxlen" is s[maxlen].
A consistent sentence would be
"In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by s and never beyond
s+maxlen-1."
I would prefer
"In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by s and never beyond
s[maxlen-1]"
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses 'int' instead of 'long' for the return type.
As glibc provides no wrapper, use the same type the kernel uses.
......
$ grep -n wrapper man-pages/man2/subpage_prot.2
40:There is no glibc wrapper for this system call; see NOTES.
99:Glibc does not provide a wrapper for this system call; call it using
$ grep -rn SYSCALL_DEFINE.*subpage_prot linux/;
linux/arch/powerpc/mm/book3s64/subpage_prot.c:190:
SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
$ sed -n /SYSCALL.*subpage_prot/,/^}/p \
linux/arch/powerpc/mm/book3s64/subpage_prot.c \
|grep return;
return -ENOENT;
return -EINVAL;
return -EINVAL;
return 0;
return -EFAULT;
return -EFAULT;
return err;
$ sed -n /SYSCALL.*subpage_prot/,/^}/p \
linux/arch/powerpc/mm/book3s64/subpage_prot.c \
|grep '\<err\>';
int err;
err = -ENOMEM;
err = -ENOMEM;
err = 0;
return err;
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This paragraph is a little bit hidden at the end of DESCRIPTION;
make it a little more prominent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit aae8a97d3ec30788790d1720b71d76fd8eb44b73 (part
of kernel release v2.6.39) added a check to disallow creating a
hardlink to an unlinked file.
The manual page already describes the trick of using
AT_SYMLINK_FOLLOW as an alternative to AT_EMPTY_PATH, and for
AT_EMPTY_PATH the manual page already notes that it "will
generally not work if the file has a link count of zero". However,
the precise error (ENOENT) is not mentioned, and the error case
isn't mentioned in the ERRORS section at all.
This makes it easy to overlook the fact that the AT_SYMLINK_FOLLOW
trick on /proc/self/fd/NN won't work on deleted files, as
evidenced by the follow message (which turns up when googling
"linkat deleted ENOENT"):
https://groups.google.com/g/linux.kernel/c/zZO4lqqwp64
Signed-off-by: Mathias Rav <m@git.strova.dk>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses 'pid_t' instead of 'long' for the return type.
As glibc provides no wrapper, use the same types the kernel uses.
$ sed -n 34,36p man-pages/man2/set_tid_address.2
.PP
.IR Note :
There is no glibc wrapper for this system call; see NOTES.
$ grep -rn 'SYSCALL_DEFINE.*set_tid_address' linux/
linux/kernel/fork.c:1632:
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
$ sed -n 1632,1638p linux/kernel/fork.c
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
{
current->clear_child_tid = tidptr;
return task_pid_vnr(current);
}
$ grep -rn 'task_pid_vnr(struct' linux/
linux/include/linux/sched.h:1374:
static inline pid_t task_pid_vnr(struct task_struct *tsk)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Learned from an email converasation with Mike Crowe, and
verified by experiment.
Reported-by: Mike Crowe <mac@mcrowe.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see, it is instead simply an alias for a
wrapper that calls _llseek(). Saying it's an alias for "llseek()"
seems confusing.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We're talking of a mix of wrapper functions and system calls
in this page. lseek() is both a system and a wrapper function,
and this page is mostly describing the wrapper
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
{.IR var [x]} -> {.I var[x]}
There were around 15 entries of the former,
and around 360 of the latter.
Found using:
$ grep -rn '^\.I[ |R].* \[.*\]' |sort
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Found using:
$ grep -rn '\\f., [^ ]*\\f. and' man?
I also updated the markup in that paragraph: \f -> .
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the comment of the example program, the peer blocks on fwait()
rather than fpost().
Signed-off-by: Jing Peng <pj.hades@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than mention these pages under the discussion of one
version of the standard, move that text to the end of the page,
where it is probably a little more obvious.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linking up the info presented on this page with the discussion
in getcontext(3) helps the reader.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The script can be used this way:
git commit -sm "$(./scripts/modified_pages.sh): Short commit msg"
And then maybe --amend and add a longer message.
This is especially useful for changes to many pages at once,
usually when running a script to apply some global changes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With this change, there remain almost no vestiges of information
about the long defunct Linux libc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[.B XX_*] is the most extended form in the pages.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
faccessat2() was added in Linux 5.8 and enables a fix to
longstanding bugs in the faccessat() wrapper function.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The include path is linux/openat2, so fix the manual to reference
this correct path.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses a long as the return type for this syscall.
As glibc provides no wrapper, use the same types the kernel uses.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is probably more sensible to place this section after
the subsection "Signal mask and pending signals".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a "big picture" of what happens when a signal handler
is invoked.
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the kernel sources (kernel/fork.c::copy_process()), we have:
/*
* sigaltstack should be cleared when sharing the same VM
*/
if ((clone_flags & (CLONE_VM|CLONE_VFORK)) == CLONE_VM)
sas_ss_reset(p);
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=12665.
The fix by Thomas Gleixner was in kernel commit
78c9c4dfbf8c04883941445a195276bb4bb92c76.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit 2a36ab717e8fe678d98f81c14a0b124712719840
(part of 5.10 release) changed sys_membarrier prototype/parameters
and added two new commands [MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ]. This
man-pages patch reflects these changes, by mostly copying comments
from the kernel patch into the man-page ([Peter Oskolkov] was also
the author of the kernel change).
[mtk: commit message tweaked]
Signed-off-by: Peter Oskolkov <posk@google.com>
Cowritten-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux uses aio_context_t for these syscalls,
and it's the type provided by <linux/aio_abi.h>.
Use it in the SYNOPSIS.
libaio uses 'io_context_t', but that difference is already noted
in NOTES.
[mtk: patch slightly tweaked]
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses long as the return type for this syscall.
As glibc provides no wrapper, use the same type the kernel uses.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a page documenting the pthread_attr_setsigmask_np(3) and
pthread_attr_getsigmask_np() functions added in glibc 2.32.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
That comment wrapped on an 80-column terminal.
Divide it into two lines.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses the following:
kernel/futex.c:3778:
SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
u32, val3)
Since there is no glibc wrapper, use the same types the kernel uses.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: Applied patch manually]
getdents():
This function has no glibc wrapper.
As such, we should use the same types the Linux kernel uses:
Use 'long' as the return type.
getdents64():
The glibc wrapper uses:
ssize_t getdents64(int, void *, size_t);
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text in NOTES doesn't really relate specifically to
the #include, so remove the comment on the #include.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Extend this page with the information about CAP_PERFMON capability
designed to secure performance monitoring and observability
operation in a system according to the principle of least
privilege [1] (POSIX IEEE 1003.1e, 2.2.2.39).
[1] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Format section names inside each type.
Follow the same pattern as in stat.2 (see line 158: ".IR Note :")
Before this ffix, it was visually harder to find sections inside a type.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CAP_BPF, CAP_PERFMON, and CAP_CHECKPOINT_RESTORE have all been
added to split out the power of CAP_SYS_ADMIN into weaker pieces.
Group all of these capabilities together in the list under
CAP_SYS_ADMIN, to make it clear that there is a pattern to these
capabilities.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since Linux 5.9, CONFIG_CHECKPOINT_RESTORE also allows writing to
/proc/sys/kernel/ns_last_pid.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
makedev(3) provides much more detail on this type, so mention it
in the description rather than in 'See also'.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
queue.3 has been moved to queue.7.
Fix SEE ALSO accordingly.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
queue has been for so many years in Section 3,
and still is in Section 3 in most manuals.
For legacy reasons,
especially because hyperlinks to the online manual pages
would break otherwise,
a link queue.3 -> queue(7) is necessary.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After forking slist.3, list.3, tailq.3, stailq.3 & circleq.3
in the previous commits,
this page no longer belongs in Section 3 of the manual pages.
According to its contents, the most suitable section is Section 7.
Because of legacy reasons, a link queue.3 -> queue(7)
would be appropriate.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This helps differentiate 'TYPE' in some arguments from
'struct TYPE *var' in others, and is technically more correct.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- ffix: Use man markup
- Remove specific notes about code size increase
and execution time increase,
as they were (at least) inaccurate.
Instead, a generic note has been added.
- Structure the text into subsections.
- Remove sections that were empty after the forks.
- Clearly relate macro names (SLIST, TAILQ, ...)
to a human readable name of which data structure
they implement.
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the section for /proc/[pid]/smaps, the description of field
ProtectionKey occurs twice: both before and after the description of
VmFlags.
Changes made by this patch:
1) Only the first occurrence is kept because its order matches the
output of /proc/[pid]/smaps.
2) The kernel version that CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS was
introduced is only mentioned in the second occurrence. Now it's moved
to the first one.
Signed-off-by: Jing Peng <pj.hades@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add remaining details to complete the page.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A few fixes to note:
- Sorted alphabetically some macros
- ffix: remove alignment spaces in example (as in list.3)
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: NAME: Add description
list.3: DESCRIPTION: Add short description
list.3: SEE ALSO: Add insque(3) and queue(3)
list.3: BUGS: Note LIST_FOREACH() limitations
list.3: RETURN VALUE: Add details about the return value of those macros that "return" a value
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: NAME: ffix: Use man markup
list.3: SYNOPSIS: ffix: Use man markup
list.3: DESCRIPTION: ffix: Use man markup
list.3: DESCRIPTION: ffix: Use man markup
list.3: CONFORMING TO: ffix: Use man markup
list.3: EXAMPLES: ffix: Use man markup
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: SYNOPSIS: Copy include from queue.3
list.3: DESCRIPTION: Copy description about naming of macros from queue.3
list.3: DESCRIPTION: Remove unrelated code to adapt to this page
list.3: DESCRIPTION: Remove lines pointing to the EXAMPLES
list.3: CONFORMING TO: Copy from queue.3
list.3: CONFORMING TO: Adapt to this page
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Time namespaces were added in kernel 5.6, but setns() support
for time namespaces was added only starting with kernel 5.8:
commit 76c12881a38aaa83e1eb4ce2fada36c3a732bad4
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date: Mon Jul 6 17:49:11 2020 +0200
nsproxy: support CLONE_NEWTIME with setns()
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Paul Eggert commented on a patch that proposed to note the
POSIX.2001 details:
No actual POSIXish implementation ever made it a
real-floating type, though, and that point should be made
lest some conscientious programmer worry about a nonexistent
porting issue.
We opted to drop the patch, but in case someone else points out
this POSIX.1-2001 difference in the future, let's leave a comment
in the page source.
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Describe the activation of the Kernel Lockdown feature via Kconfig
and the command line.
Cf. Documentation/admin-guide/kernel-parameters.rst.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Highlight to the reader that if another filter returns a
higher-precedence action value, then the ptracer will not
be notified.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The wording was incorrect:
It stated that 'eflags' may be the OR of one or two of those two flags,
but then a third flag was documented
(which according to the previous wording could not be used?!).
Moreover, the wording also disallowed using 0 (i.e., no flags at all),
which POSIX specifically allows;
I tested the function with no flags and it worked fine for me,
so I guess it was a problem with the documentation,
and not with the implementation itself.
POSIX ref: https://pubs.opengroup.org/onlinepubs/9699919799/
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I added the EXAMPLES section.
The examples in this page are incomplete
(you can't copy&paste&compile&run).
I fixed the one about TAILQ first,
and the rest should follow.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX requires that the <regex.h> header shall define
the structures and symbolic constants used by the
regcomp(), regexec(), regerror(), and regfree() functions.
Therefore, there should be no need to include <sys/types.h>
at all.
The POSIX docs don't use that include:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is implied by POSIX because it requires that these strings in
the locale definition file contain one symbol. Currently,
locale.5 does not document the concept of symbols, this change
glosses over that and just uses the term "single-character
string".
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Revert "uint_least8_t.3, uint_least16_t.3, uint_least32_t.3, uint_least64_t.3, uint_leastN_t.3: New links to system_data_types(7)"
This reverts commit a5d13a32b7.
Revert "system_data_types.7: Add uint_leastN_t family of types"
This reverts commit 3450a5621e.
Revert "int_least8_t.3, int_least16_t.3, int_least32_t.3, int_least64_t.3, int_leastN_t.3: New links to system_data_types(7)"
This reverts commit 876838354d.
Revert "system_data_types.7: Add int_leastN_t family of types"
This reverts commit f9b54d3a2e.
Revert "uint_fast8_t.3, uint_fast16_t.3, uint_fast32_t.3, uint_fast64_t.3, uint_fastN_t.3: New links to system_data_types(7)"
This reverts commit 496b1aad79.
Revert "system_data_types.7: Add uint_fastN_t family of types"
This reverts commit 3c9ae6e5a2.
Revert "int_fast8_t.3, int_fast16_t.3, int_fast32_t.3, int_fast64_t.3, int_fastN_t.3: New links to system_data_types(7)"
This reverts commit 9df81a23e5.
Revert "system_data_types.7: Add int_fastN_t family of types"
This reverts commit 8f12d3f683.
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: Jonathan Wakely <jwakely.gcc@gmail.com>
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Through some accident, 'sys_siglist' has been documented in
two different pages. Consolidate the information to one page
(strsignal(3)) and add 'sys_siglist" to the NAME line of that
page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We used .br to force a simple line break (with no extra blank line)
after the tag.
Recently, we added .RS/.RS, and .RS comes just after the tag,
and I realized by accident that .RS already forces a simple line break.
Therefore, .br is completely redundant here, and can be removed.
This way we get rid of "raw" *roff requests in this page.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous format/wording for the includes wasn't very clear.
Improve it a bit following Branden's proposal.
It also helps reduce lines of code.
Add a subsection in NOTES explaining the conventions used.
Remove the comment for helping maintain the page,
as the NOTES section is now clear enough.
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Reported-by: Dave Martin <Dave.Martin@arm.com>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Due to a userspace breakage, commit 1251201c0d34 ("sched/core: Fix
uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic
and code") changed the semantics of sched_getattr(2) when the
userspace struct is smaller than the kernel struct. Now, any
trailing non-zero data in the kernel structure is ignored when
copying to userspace. We also document the original error code
correctly (it was EFBIG not E2BIG) in the BUGS section.
Ref: 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
robustify sched_read_attr() ABI logic and code")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Around the text:
"Feature Test Macro Requirements for glibc..."
replace ".in -4n/.in" with ".RS -4/.RE".
The latter form is more idiomatic use of man macros.
The nroff output is unchanged.
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Returning its argument without further checks is almost always
wrong for la_version.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The contents of each type are a logical block that is indented as
a block. They are not separate paragraphs that happen to be
indented separately, but a set of continuous paragraphs, all at
the same level, indented as a block from the list entry--the name
of the type--. Therefore, it makes more sense to use block
indentation, represented by .RS/.RE, rather than indenting each
paragraph separately. That way it's also easier to further indent
a separate paragraph inside a block, which happens for example in
the case of float_t & double_t. It's simply much easier now to
use .IP specifically in those cases where you want to indent just
a single paragraph.
Also added an ending separator comment line just after the last
type.
[mtk: minor edits to commit message]
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From the email discussion:
> Hi Alex,
>
> On 9/25/20 9:31 AM, Alejandro Colomar wrote:
>> Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
>> ---
>> man2/seccomp.2 | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/man2/seccomp.2 b/man2/seccomp.2
>> index 58033da1c..d6b856c32 100644
>> --- a/man2/seccomp.2
>> +++ b/man2/seccomp.2
>> @@ -1101,7 +1101,7 @@ install_filter(int syscall_nr, int t_arch, int f_errno)
>> };
>>
>> struct sock_fprog prog = {
>> - .len = (unsigned short) (sizeof(filter) / sizeof(filter[0])),
>> + .len = sizeof(filter) / sizeof(filter[0]),
>> .filter = filter,
>> };
>
> I have a small doubt about this change. With the change,
> there are no compilation warnings.
>
> But, if we change the code to something slightly different:
>
> [[
> size_t x = (sizeof(filter) / sizeof(filter[0]));
> struct sock_fprog prog = {
> .len = x,
> .filter = filter,
> };
> ]]
>
> The "cc -Wconversion" gives us the following warning:
>
> warning: conversion from ‘size_t’ {aka ‘long unsigned int’}
> to ‘short unsigned int’ may change value
>
> Presumably we don't get a warning for an assignment of the form
>
> .len = (sizeof(filter) / sizeof(filter[0]))
>
> because the compiler is smart enough to work out that the
> value of the constant expression is within the range of
> "unsigned short".
>
> Your thoughts?
Hi Michael,
I'd say that the cast doesn't fix any problems at all. It silences a
valid warning, and I'd use a pragma for that (to be more explicit about
the intention of silencing a warning) if I do want -Wconversion enabled
(which usually I don't want, because it's too noisy) and I'm sure that
this won't overflow. I'd limit the use casts to only when I *really*
need to.
I guess that if you enable -O3, the warning will vanish again because
the compiler will optimize away 'x' (but I didn't test).
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use \(aq to get an unslanted single quote inside monospace code
blocks. Using a simple ' results in a slanted quote inside PDFs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These variables are either of an unsigned integer type per POSIX;
or of an integer type per POSIX, that Linux defines as an unsigned integer type.
Print them with 'uintmax_t' instead of 'intmax_t' to avoid
big positive numbers being printed as negative numbers.
Bug report:
From: Konstantin Bukin @ 2020-09-13 15:04 UTC
To: mtk.manpages; +Cc: Konstantin Bukin, linux-man
inode numbers are expected to be positive. Casting them to a signed type
may result in printing negative values. E.g. running example program on
the following file:
$ ls -li test.txt
9280843260537405888 -r--r--r-- 1 kbukin hardware 300 Jul 21 06:36 test.txt
results in the following output:
$ ./example test.txt
ID of containing device: [0,480]
File type: regular file
I-node number: -9165900813172145728
Mode: 100444 (octal)
Link count: 1
Ownership: UID=2743 GID=30
Preferred I/O block size: 32768 bytes
File size: 300 bytes
Blocks allocated: 8
Last status change: Tue Jul 21 06:36:50 2020
Last file access: Sat Sep 12 14:13:38 2020
Last file modification: Tue Jul 21 06:36:50 2020
Such erroneous reporting happens for inode values greater than maximum
value which can be stored in signed long. Casting does not seem to be
necessary here. Printing inode as unsigned long fixes the issue.
Reported-by: Konstantin Bukin <kbukin@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text comes straight from POSIX. In copyright terms,
this is at least a gray area, and in any case, simply reproducing
the text of the standard has limited value, since people can
consult the standard directly. So, rewrite the text, to simply
quote the description from fenv(3).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The implementation shall support one or more programming
environments where these types are no wider than 'long'.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A limit can be defined by other than POSIX
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add note about length modifiers and conversions to [u]intmax_t,
and add a corresponding example.
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note: There are a few members of this structure that are
not required by POSIX (XSI extensions, and such).
I simply chose to not document them at all.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Various ports that had their own indigenous system calls have
been discontinued. Remove those system calls (none of which had
manual pages!) to a separate part of the page, to avoid
cluttering the main list of system calls.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the links removed in commit 247c654385 should
have been kept, because in some cases there are real system
calls whose wrapper functions are documented in Section 3.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SO_PEERSEC was introduced for AF_UNIX stream sockets connected via
connect(2) in Linux 2.6.2 [1] and later augmented to support
AF_UNIX stream and datagram sockets created via socketpair(2) in
Linux 4.18 [2]. Document SO_PEERSEC in the socket.7 and unix.7
man pages following the example of the existing SO_PEERCRED
descriptions. SO_PEERSEC is also supported on AF_INET sockets
when using labeled IPSEC or NetLabel but defer adding a
description of that support to a separate patch.
The module-independent description of the security context
returned by SO_PEERSEC is from Simon McVittie.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=da6e57a2e6bd7939f610d957afacaf6a131e75ed
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b811db2cb2aabc910e53d34ebb95a15997c33e7
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cowritten-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The initial version documents sigval, ssize_t, suseconds_t,
time_t, timer_t, timespec, and timeval.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: the coding style used in the example could lead people to
inject memory leaks in their code if they cut/paste/modify the
code to replace "exit" paths with "return" paths from a library
function.]
[Marko, from the mail thread discussing this patch:]
You are right about terminating the process. However, people copy
that example and put the code in a function changing "exit" to
"return". There are a bunch of examples like that here
https://beej.us/guide/bgnet/html/#poll, for instance. That error
bothered me when reading the network programming guide
https://beej.us/guide/bgnet/html/. Than I looked for information
elsewhere:
https://stackoverflow.com/questions/6712740/valgrind-reporting-that-getaddrinfo-is-leaking-memoryhttps://stackoverflow.com/questions/15690303/server-client-sockets-freeaddrinfo3-placement
And finally, I checked manual pages and saw where these errors
come from.
When you change that to a function and return without doing
freeaddrinfo, that is a memory leak. I believe an example should
show good programming practices. Relying on exiting and clearing
the memory in that case is not such a case. In my opinion, these
examples lead people to make mistakes in their programs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Let's move to the 21st century. Instead of casting system data
types to long/long long/etc. in printf() calls, instead cast to
intmax_t or uintmax_t, the largest available signed/unsigned
integer types.
[mtk: rewrote commit message]
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use printf()'s '#' flag character to prepend the string "0x".
However, when the number is printed in uppercase, and the prefix
is in lowercase, the string "0x" needs to be manually written.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For consistency.
The types are written both with and without the redundant 'int' keyword
all over the man-pages. However, the most used form, by far, is the one
without 'int'.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Member 'tv_nsec' of 'struct timespec' is of type 'long' (see time.h.0p),
and therefore, the cast is completely redundant.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'nread' is of type 'ssize_t'
'tot' adds up different values contained in 'nread',
so it should also be 'ssize_t', and not 'int' (which possibly overflows).
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Notes: I copied .nf and .fi from futex.2, but they made no visual difference.
What do they actually do?
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For consistency.
Most man pages use 'long' instead of 'long int'.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is the DT_RUNPATH/DT_RPATH of the calling object (not the
executable) that is relevant for the library search. Verified
by experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When calling msgrcv() with the MSG_COPY flag, it will report
EINVAL error even we if have disabled CONFIG_CHECKPOINT_RESTORE.
ENOSYS will be reported only if we also specify the IPC_NOWAIT
flag.
[mtk: edited commit message]
Notes from mtk:
The relevant kernel code is this:
[[
#ifdef CONFIG_CHECKPOINT_RESTORE
...
#else
static inline struct msg_msg *prepare_copy(void __user *buf, size_t bufsz)
{
return ERR_PTR(-ENOSYS);
}
...
static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long
msgtyp, int msgflg,
long (*msg_handler)(void __user *, struct msg_msg *, size_t))
{
...
if (msgflg & MSG_COPY) {
if ((msgflg & MSG_EXCEPT) || !(msgflg & IPC_NOWAIT))
return -EINVAL;
copy = prepare_copy(buf, min_t(size_t, bufsz, ns->msg_ctlmax));
...
}
]]
We'll only hit the ENOSYS error if:
(1) MSG_COPY was specified;
(2) IPC_NOWAIT was not specified; and
(3) CONFIG_CHECKPOINT_RESTORE was not enabled.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fread(3), unlike read(2) which returns a ssize_t, returns a
size_t. It doesn't distinguish between error and enf-of-file.
Instead, either ferror(3) or feof(3) need to be checked if fread()
returned 0.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
`p1` (and `p2` too) is `const void *` and it comes from a
`const char **` (for legacy reasons, argv is not `const` but should be
treated as if it were). That means, the ultimate `char` is `const`:
"a pointer to a pointer to a const char".
Let's see what is going on before the fix first, and then the fix.
Before the fix:
`(char *const *)` (I removed the space on purpose) casts `p1` to be
"a pointer to a const pointer to a non-const char". That's clearly
not what it originally was.
Then we dereference, ending with a `char *const`, which is
"a const pointer to a non-const char". But given that the pointer value
is passed to a function, `const` doesn't make sense there, because the
function will already take a copy of it, so it is impossible to modify
the pointer itself.
The fix:
`(const char **)` The only thing that is const is the ultimate `char`,
which is the only thing that matters, because it is the only thing
strcmp(3) has access to (everything else, i.e. the pointers, are
copies).
Then, after the dereference we end up with `const char *`, the type of
argv (more or less, as previously noted), which is also the type of the
arguments to strcmp(3).
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `const void *` to `struct mi *` should result in a warning if
done implicitly. The explicit cast was probably silencing that warning.
`const` can and should be kept.
Now, casting `const void *` to `const struct mi *` is done implicitly.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `void *` to `struct child_args *` is already done implicitly.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type `struct msgbuf *` is implicitly casted to `const void *`.
Not only that, but the explicit cast to `void *` was slightly
misleading.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type `sigset_t *` is implicitly casted to `void *`.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type `struct sockaddr_nl *` is implicitly casted to `void *`.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `void *` to `double (*cosine)(double)` is already done
implicitly.
I had doubts about this one, but `gcc -Wall -Wextra` didn't complain
about it.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type of `val` is `int **`, and it will work with tsearch()
anyway because of implicit cast from `void *`, so declaring it as an
`int **` simplifies the code.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It doesn't make any sense to pass a pointer to the array to
read(2).
It might make sense to pass a pointer to the first element of the
array, but that is already implicitly done when passing the array,
which decays to that pointer, so it's simpler to pass the array.
And anyway, the cast was unneeded, as any pointer is implicitly
cast to `void *`.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `int *` to `const void *` is already done implicitly.
Not only that, but the explicit cast to `void *` was slightly
misleading.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than:
sometype x;
for (x = ....; ...)
use
for (sometype x = ...; ...)
This brings the declaration and use closer together (thus aiding
readability) and also clearly indicates the scope of the loop
counter variable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than writing things such as:
struct sometype *x;
...
x = malloc(sizeof(*x));
let's use C99 style so that the type info is in the same line as
the allocation:
struct sometype *x = malloc(sizeof(*x));
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The same calculations are repeated in malloc() and printf() calls.
For better readability, do the calculations once.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Hi Michael,
Continuing with the series, this is the first of the last set of
patches: (2).1 as numbered in previous emails.
Regards,
Alex.
------------------------------------------------------------------------
>From ad5f958ed68079791d6e35f9d70ca5ec2a72c43b Mon Sep 17 00:00:00 2001
From: Alejandro Colomar <colomar.6.4.3@gmail.com>
Date: Thu, 3 Sep 2020 12:11:18 +0200
Subject: [PATCH] memusage.1: Use sizeof consistently
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document fanotify_init(2) flag FAN_REPORT_NAME and the format of
the event info type FAN_EVENT_INFO_TYPE_DFID_NAME.
The fanotify_fid.c example is extended to also report the name of
the created file or subdirectory.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document fanotify_init(2) flag FAN_REPORT_DIR_FID and event info
type FAN_EVENT_INFO_TYPE_DFID.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With fanotify_init(2) flag FAN_REPORT_FID, the group identifies
filesystem objects by file handles in a single event info record
of type FAN_EVENT_INFO_TYPE_FID.
We intend to add support for new fanotify_init(2) flags for which
the group identifies filesystem objects by file handles and add
more event info record types.
To that end, start by changing the language of the man page to
refer to a "group that identifies filesystem objects by file
handles" instead of referring to the FAN_REPORT_FID flag and
document the extended event format structure in a more generic
manner that allows more than a single event info record and not
only a record of type FAN_EVENT_INFO_TYPE_FID.
Clarify that the object identified by the file handle refers to
the directory in directory entry modification events.
Remove a note about directory entry modification events and
monitoring a mount point that I found to be too confusing and out
of context.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Jakub points out that my last resync may accidentally have been
against an old version of the kernel source, since the resync
resulted in many deleted lines. I suspect he may be right.
Let's resync against today's current kernel.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the
following way:
- When the result of ``sizeof`` is multiplied (or otherwise
modified), write ``sizeof`` in the first place.
Rationale:
``(sizeof(x) * INT_MAX * 2)`` doesn't overflow.
``(INT_MAX * 2 * sizeof(x))`` overflows, giving incorrect
results.
As a side effect, the parentheses of ``sizeof`` are not next to
the parentheses of the whole expression, and it is visually
easier to read.
Detailed rationale:
In C, successive multiplications are evaluated left to right (*),
and therefore here is what happens (assuming x86_64):
``(sizeof(x) * INT_MAX * 2)``:
1) sizeof(x) * INT_MAX (the type is the largest of both, which
is size_t (unsigned long; uint64_t)).
2) ANS * 2 (the type is again the largest: size_t)
``(INT_MAX * 2 * sizeof(x))``:
1) INT_MAX * 2 (the type is the largest of both, which is
int as both are int (int; int32_t), so the
result is already truncated as it doesn't fit
an int; at this point, the intermediate result
will be 2^32 - 2 (``INT_MAX - 1``) (if I did
the math right)).
2) ANS * 2 (the type is again the largest of both: size_t;
however, ANS was already incorrect, so the
result will be an incorrect size_t value)
(*): https://en.cppreference.com/w/c/language/operator_precedence
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use 'struct timespec', not 'struct timeval', and adjust
the variable name accordingly.
Reported-by: Tony May <tony.may@mediakind.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the
following way:
- Never use a space after ``sizeof``, and always use parentheses
around the argument.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#spaces
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I run ``sudo make`` and then visualized the man page with
``man 3 queue``, and the contents looked good.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In C89 strtod returns zero on underflow, but since C99 it can return
non-zero. This means the strtod.3 page contradicts all recent C and
POSIX standards. Both C and POSIX say "smallest normalized positive
number", but for consistency with HUGE_VAL, HUGE_VALF and HUGE_VALL
this patch uses the constants for those numbers.
Also slightly improve the presentation of return values for overflow.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The description of hexadecimal floating-point output is missing a
character describing the exponent. The guarantee of at least one digit
in the exponent is present in both C99 and POSIX.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Killing a thread with SECCOMP_RET_KILL_THREAD is very likely
to leave the rest of the process in a broken state.
Wording pretty much taken from Rick Felker's suggestion.
Reported-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This header is used inconsistently -- man pages are UTF-8 encoded
but not setting this marker. It's only respected by the man-db
package, and seems a bit anachronistic at this point when UTF-8
is the standard default nowadays.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There was code containing ``CIRCLEQ_*`` in the examples for ``TAILQ_*``. It was introduced by accident in commit ``041abbe``.
From 0c9dfbe9b1ce1130e9a92d1a16fbecd4a08bbe29 Mon Sep 17 00:00:00 2001
From: Alejandro Colomar <colomar.6.4.3@gmail.com>
Date: Wed, 12 Aug 2020 09:11:27 +0200
Subject: [PATCH] queue.3: Remove wrong code from example
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A naked tilde ("~") renders poorly in PDF. Instead use "\(ti",
which renders better in a PDF, and produces the same glyph
when rendering on a terminal.
Reported-by: Geoff Clare <gwc@opengroup.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
===========
DESCRIPTION
===========
I'm documenting ``CIRCLEQ_*`` macros in queue.3. While writing
this, I noticed that the documentation for some types of
queues/lists talked about swapping contents of two lists, but only
for some of them. I then found that those macros (``*_SWAP``)
don't exist in my system (Debian), but exist in BSD, and I also
found that a previous commit (6559169cac) commented out a lot of
the *_SWAP macros documentation, but not all, and the reason was
that they were not present on glibc.
I checked that I didn't have any of the *_SWAP macros on my glibc,
so I think this is probably that the commit simply forgot to
comment some of
them.
=======
TESTING
=======
I did ``sudo make`` and then visualized the man page with
``man 3 queue``, and the changes looked good.
I also noticed that the subsection ``Tail queue example`` contents
were wrong, as they contained calls to CIRCLEQ_* macros. I will
address that in a future patch, before I submit the patch
documenting CIRCLEQ_*.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explicitly mention CONFIG_LEGACY_PTYS, and note that it is disabled
by default since Linux 2.6.30.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The paragraph noting applications that use pseudoterminals is better
placed in NOTES than in the DESCRTIPTION.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Historically, a comment of the following form at the top of a
manual page was used to indicate too man(1) that the use of tbl(1)
was required in order to process tables:
'\" t
However, at least as far back as 2001 (according to Branden),
man-db's man(1) automatically uses tbl(1) as needed, rendering
this comment unnecessary. And indeed many existing pages in
man-pages that have tables don't have this comment at the top of
the file. So, drop the comment from those files where it is
present.
[mtk: completely rewrote commit message]
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some paragraph breaks to the discussion of 'mode' to make
the details a bit easier to read.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The \" comment produces blank lines. Use the .\" that the vast
majority of the codebase uses instead.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The subsection named "Rpath token expansion" was renamed to
"Dynamic string tokens". Update the cross-references inside
the page accordingly.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This happens for more than just DT_RPATH/DT_RUNPATH.
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
"/bin/pwd" happens to work with the GNU coreutils implementation,
which has -P as the default, contrary to POSIX requirements.
Use "pwd -P" instead, which is shorter, easier to type, and should
work everywhere.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CAP_SYS_RESOURCE also allows overriding /proc/sys/fs/mqueue/msg_max
and /proc/sys/fs/mqueue/msgsize_max.
Signed-off-by: Saikiran Madugula <hummerbliss@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify that it is recursive read locks on the read-write lock
that make it difficult to implement
PTHREAD_RWLOCK_PREFER_WRITER_NP.
Update the libc-alpha URL and provide the URL to the POSIX wording
that is quoted in the comment.
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Close the PID file descriptor in the example program, to hint to
the reader that like every other kind of file descriptor, a PID FD
should be closed.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '--' to '\-\-' for options and '--' between words to '\(em'
(em-dash).
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
msg_iovlen is incorrectly typed (according to POSIX) in addition
to msg_controllen, but unlike msg_controllen, this wasn't
mentioned for msg_iovlen.
msg_iovlen being incorrectly typed hasn't been reported as a GCC
bug, but there's no point since it is caused by the same
underlying issue.
Sources: POSIX.1-2017 (<sys/socket.h>), Linux
(include/linux/socket.h)
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See kernel commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
and the in fs/namespace.c::do_loopback():
err = -EINVAL;
if (mnt_ns_loop(old_path.dentry))
goto out;
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Change '-' to '\(en' for a range.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
cgroups-v1/v2 documentation got moved to the "admin-guide" subfolder
and converted from .txt files to .rst
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update description of permissions for port-mapped I/O set
per-thread and not per-process. Mention that iopl() can not
disable interrupts since Linux 5.5 anymore and is in general
deprecated and only provided for legacy X servers.
See https://bugzilla.kernel.org/show_bug.cgi?id=205317
Reported-by: victorm007@yahoo.com
Signed-off-by: Thomas Piekarski <t.piekarski@deloquencia.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for the the PR_SET_TAGGED_ADDR_CTRL and
PR_GET_TAGGED_ADDR_CTRL prctls added in Linux 5.4 for arm64.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for the the PR_SVE_SET_VL and PR_SVE_GET_VL
prctls added in Linux 4.15 for arm64.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A patch has been merged for v5.8 that changes how syncfs() reports
errors. Change the sync() manpage accordingly.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The host ID might once have been intended to be globally unique,
but that turned out not to feasible.
Reported-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If this was ever going to change the test case is very simple:
$ cd /tmp
$ touch libc.so.6
$ LD_LIBRARY_PATH=: sh
sh: error while loading shared libraries: libc.so.6: file too short
Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by mail from Geoff Clare, there are some details that
need correcting:
Subject: standards(7) (was: man-pages-5.07 released)
Date: Wed, 10 Jun 2020 10:53:14 +0100
From: Geoff Clare <gwc@opengroup.org>
...
The first isn't really a problem, just an oddity. You list
POSIX.1b as "formerly known as POSIX.4", but you don't do the
equivalent for POSIX.1c ("formerly known as POSIX.4a").
There are several problems with the XPG3 entry:
"first significant release" - although I suppose XPG3 could
be considered more significant than XPG2 because it was the
first one to incorporate POSIX.1, I don't think it's fair to
imply that XPG2 was not significant. (E.g. XPG2 was
significant in that it was the first release to include
I18N, and the first that had a conformance test suite.)
"produced by the X/Open Company, a multivendor consortium" -
this conflates two different things called X/Open. X/Open
Company Limited is the UK company that did the editing work,
organised meetings, etc. X/Open Group is the consortium
whose members developed the technical content.
"This multivolume guide was based on the POSIX standards" -
at the time there was only one POSIX standard, namely
POSIX.1-1988. The first release to incorporate POSIX.2 was
XPG4 (which you may consider worth noting in the XPG4
entry).
To fix these problems I would suggest changing the entry to:
XPG3 Released in 1989, this was the first release of the X/Open
Portability Guide to be based on a POSIX standard
(POSIX.1-1988). This multivolume guide was developed by the
X/Open Group, a multivendor consortium.
Under SUSv2 I would suggest changing:
Sometimes also referred to as XPG5.
to:
Sometimes also referred to (incorrectly) as XPG5.
Under POSIX.1-2001, SUSv3: "XSI conformance constitutes the Single
UNIX Specification version 3 (SUSv3)" is problematic. I think I
touched on this in the previous discussion. I would suggest
deleting that sentence and instead inserting, before "Two
Technical Corrigenda ...", the following:
The Single UNIX Specification version 3 (SUSv3) comprises the
Base Specifications containing XBD, XSH, XCU and XRAT as
above, plus X/Open Curses Issue 4 version 2 as an extra volume
that is not in POSIX.1-2001.
Something similar is needed in the POSIX.1-2008, SUSv4 entry where
it talks about "the same four parts". The extra volume this time
is X/Open Curses Issue 7.
]]
Cowritten-by: Geoff Clare <gwc@opengroup.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The fact that a more negative nice value means higher
priority is a continuing source of confusion.
Reported-by: Dan Kenigsberg <danken@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove superfluous paragraph macros.
Remove request ".br" if it precedes a line, that begins with a
space, as such lines automatically cause a break.
There is no change in the output from "nroff" and "groff".
###
Examples of warnings from "mandoc -Tlint":
mandoc: bindresvport.3:41:2: WARNING: skipping paragraph macro: PP after SH
mandoc: crypt.3:228:2: WARNING: skipping paragraph macro: PP empty
mandoc: dlinfo.3:151:2: WARNING: skipping paragraph macro: IP empty
mandoc: exec.3:86:2: WARNING: skipping paragraph macro: PP after SS
mandoc: getsubopt.3:45:2: WARNING: skipping paragraph macro: br before text line with leading blank
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove superfluous paragraph macros.
Remove ".br" if it is before a line that starts with a space
character, as such lines automatically cause a break.
###
The output is unchanged, except two empty lines are added at the
bottom (before the footer line) in the output of "nroff" for the files
"alloc_hugepages.2" and "userfaultfd.2".
###
Examples of warnings from "mandoc -Tlint":
mandoc: access.2:283:2: WARNING: skipping paragraph macro: PP after SH
mandoc: adjtimex.2:185:2: WARNING: skipping paragraph macro: PP empty
mandoc: futex.2:728:2: WARNING: skipping paragraph macro: IP empty
mandoc: getsid.2:48:2: WARNING: skipping paragraph macro: br before text line with leading blank
mandoc: init_module.2:290:2: WARNING: skipping paragraph macro: PP after SS
mandoc: ioctl_fideduperange.2:27:2: WARNING: skipping paragraph macro: br after SH
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Trim tailing space in "strings".
There is no change in the output from "nroff" and "groff".
###
Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z
[ "test-groff" is a developmental version of "groff" ]
troff: <attributes.7>:510: warning: trailing space
troff: <attributes.7>:512: warning: trailing space
troff: <attributes.7>:513: warning: trailing space
troff: <attributes.7>:516: warning: trailing space
troff: <attributes.7>:649: warning: trailing space
troff: <attributes.7>:681: warning: trailing space
troff: <attributes.7>:720: warning: trailing space
####
troff: <environ.7>:181: warning: trailing space
troff: <environ.7>:182: warning: trailing space
####
troff: <ip.7>:820: warning: trailing space
####
troff: <signal.7>:316: warning: trailing space
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
####
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z
[ "test-groff" is a developmental version of "groff" ]
Input file is ./proc.5
troff: <proc.5>:4410: warning: trailing space
troff: <proc.5>:5206: warning: trailing space
troff: <proc.5>:5488: warning: trailing space
###
There is no change in the output from "nroff" and "groff".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Trim trailing space.
There is no change in the output from "nroff" and "groff".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These links were mostly created when pages were moved between
sections, in almost every case several years ago. The idea
was to allow people time to get used to the new section numbers
while still having commands of the form "man <sec> <page>"
work as before. Let's assume that people have now had time to
get used to the new section numbers, and remove these links.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These are all links that were created several years ago, mainly
when pages were migrated to different sections, in order to
allow the 'man' commands using the old section numbers to work.
However, the plan was always to eventually remove them, after
allowing people who cared to get used to the new section numbers.
Now, after 5+ years in each case, it's time to remove
these links.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Traditionally, magic links have not been a well-understood topic
in Linux. This helps clarify some of the terminology used in
openat2.2.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The variable is used in the code example, but not declared,
leading to a compilation error.
Signed-off-by: Oleksandr Kravchuk <open.source@oleksandr-kravchuk.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I believe new users should be discouraged from using atoi() and
that its disadvantages should be explained.
I added the information that 0 is returned on error - although C
standard and POSIX say that "If the value of the result cannot be
represented, the behavior is undefined." there are some
interpretations that 0 has to be returned
https://stackoverflow.com/questions/38393162/what-can-i-assume-about-the-behaviour-of-atoi-on-error
and this is also what happens in practice with glibc, musl and
uClibc.
Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pgrep for example searches for a process name in /proc/pid/status
and therefore cannot find processes whose names are longer than 15
characters unless -f is specified.
Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reorder full wordings to match the order of abbreviations.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Starting with Linux 5.8, setns() can take a PID file descriptor as
an argument, and move the caller into or more of the namespaces of
the thread referred to by that descriptor.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page currently incorrectly says that 'fd' must refer to
a descendant PID namespace. However, 'fd' can also refer to
the caller's current PID namespace. Verified by experiment,
and also comments in kernel/pid_namespace.c (Linux 5.8-rc1).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Output is from: test-groff -b -e -mandoc -T utf8 -rF0 -t -w w -z
[ "test-groff" is a developmental version of "groff" ]
There is no change in the output of "nroff" and "groff".
####
troff: <fts.3>:50: warning: trailing space
####
troff: <getgrnam.3>:175: warning: trailing space
####
troff: <getpwnam.3>:181: warning: trailing space
####
troff: <rcmd.3>:52: warning: trailing space
troff: <rcmd.3>:57: warning: trailing space
troff: <rcmd.3>:60: warning: trailing space
troff: <rcmd.3>:63: warning: trailing space
troff: <rcmd.3>:69: warning: trailing space
troff: <rcmd.3>:73: warning: trailing space
####
troff: <rexec.3>:48: warning: trailing space
troff: <rexec.3>:51: warning: trailing space
####
troff: <sem_open.3>:36: warning: trailing space
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove superfluous space at the end of a processed input line.
There is no change in the output from "nroff" and "groff".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current version shows the square brackets, '[' and ']', in
boldface.
Use the '\c' escape sequence (function) to join the output of two
macros.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for the PR_PAC_RESET_KEYS ioctl added in Linux
5.0 for arm64.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The 'comm' value is typically the same as the (possibly
truncated) executable name, but may be something different.
Reported-by: Jonny Grant <jg@jguk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Recently I had to troubleshoot a problem where a connect() call
was returning EACCES:
17648 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 37
17648 connect(37, {sa_family=AF_INET, sin_port=htons(8081),
sin_addr=inet_addr("10.12.1.201")}, 16) = -1 EACCES (Permission
denied)
I've traced this to SELinux policy denying the connection. This is
on a Fedora 23 VM:
$ cat /etc/redhat-release
Fedora release 23 (Twenty Three)
$ uname -a
Linux mako-fedora-01 4.8.13-100.fc23.x86_64 #1 SMP Fri Dec 9 14:51:40
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
The manpage says this can happen when connecting to a broadcast
address, or when a local firewall rule blocks the connection.
However, the address above is unicast, and using 'wget' from
another account to access the URL works fine.
The context is that we're building an OS image, and this involves
downloading RPMs through a proxy. The proxy (polipo) is labelled
by SELinux, and I guess there is some sort of policy that says
"proxy can only connect to HTTP ports". When trying to connect to
a server listening on a port that is not labeled as an HTTP server
port, I guess SELinux steps in. With 'setenforce 0', the build
works fine. In the kernel sources I see connect() calls
security_socket_connect() (see
https://elixir.bootlin.com/linux/latest/source/net/socket.c#L1855),
which calls whatever security hooks are registered. I see the
SELinux hook getting registered at
https://elixir.bootlin.com/linux/latest/source/security/selinux/hooks.c#L7047,
and setting a perf probe on the call proves that the
selinux_socket_connect function gets called (while
tcp_v4_connect() is not).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From an email conversation with Léo Stefanesco:
> In the man7.org version of the man page for user_namespaces(7), it reads:
>
> there are many privileged operations that affect
> resources that are not associated with any namespace type,
> for example, changing the system time
> (governed by CAP_SYS_TIME)
>
> which is not consistent with time_namespaces(7).
In fact, strictly peaking the text still is correct, even after
the arrival of time namespaces.
Time namespaces virtualize only the boot-time and monotonic
clocks, not the "real time" (i.e., calendar time), which is the
time referred in the passage you quote.
That said, the text is perhaps now a little misleading, and
a little clarification would help. I changed the text to:
there are many privileged operations that affect
resources are not associated with any namespace type,
for example, changing the system **(i.e., calendar)** time
(governed by CAP_SYS_TIME)
Reported-by: Léo Stefanesco <leo.lveb@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This page was first added more than 20 years ago. Since
that time it has seen hardly any update, and is by now
very much out of date, as reported by Heinrich Schuchardt
and confirmed by Eugene Syromyatnikov.
As Heinrich says:
Man-pages like netdevices.7 or ioctl_fat.2 are what is
needed to help a user who does not want to read through the
kernel code.
If ioctl_list.2 has not been reasonably maintained since
Linux 1.3.27 and hence is not a reliable source of
information, shouldn't it be dropped?
My answer is, yes (but let's move a little info into ioctl(2)).
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reported-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In preparation for removing ioctl_list(2), let's preserve
some useful text that was added to ioctl_list(2)
by Andries Brouwer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
RAND_MAX is for rand(3). POSIX fixes random()'s range at 2^31-1;
RAND_MAX may be smaller on some platforms (even though with glibc
or musl on Linux they are the same).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
FAN_ONDIR was an input only flag before introducing
FAN_REPORT_FID. Since the introduction of FAN_REPORT_FID, it can
also be in output mask.
Move the text describing its role in the output mask to fanotify.7
where the other output mask bits are documented.
[mtk: commit message tidy-up]
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It was inserted in the middle of the FAN_CLASS_ multi flags bit
and broke the multi flag documentation.
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This reverts commit a93e5c9593.
FAN_DIR_MODIFY was disabled for v5.7 release by kernel commit
f17936993af0 ("fanotify: turn off support for FAN_DIR_MODIFY").
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the few pages where this heading (which is "nonstandard" within
man-pages) is used, it always immediately follows CONFORMING TO
and generally contains information related to standards. Remove
the section heading, thus incorporating AVAILABILITY into
CONFORMING TO.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
EXAMPLES appears to be the wider majority usage across various
projects' manual pages, and is also what is used in the POSIX
manual pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
man-pages doesn't have a REPORTING BUGS section in manual pages,
but many other projects do. Make some recommendations about
placement of that section.
man-pages doesn't use COPYRIGHT sections in manual pages, but
various projects do. Make some recommendations about placement
of the section.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Although man-pages doesn't use AUTHORS sections, many projects do
use an AUTHORS section in their manual pages, so mention it in
man-pages to suggest some guidance on the position at which
to place that section.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There is one case of a cross-reference to a kernel documentation
filename that uses unescaped hyphens.
To avoid misrendering, escape these as \- similarly to other
instances.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add the PR_SPEC_DISABLE_NOEXEC mode added in Linux 5.1
for the PR_SPEC_STORE_BYPASS "misfeature" of
PR_SET_SPECULATION_CTRL and PR_GET_SPECULATION_CTRL.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add the PR_SPEC_INDIRECT_BRANCH "misfeature" added in Linux 4.20
for PR_SET_SPECULATION_CTRL and PR_GET_SPECULATION_CTRL.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The fact that an FE_UNDERFLOW exception is not raised for
"Range error: result underflow" is intended behavior.
See https://www.sourceware.org/bugzilla/show_bug.cgi?id=6806.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The function make_message illustrates how to use vsnprintf to
determine the required amount of memory for a specific format and
its arguments.
If make_message is called with a format which will use exactly
INT_MAX characters (excluding '\0'), then the size++ calculation
will overflow the signed integer "size", which is an undefined
behaviour in C.
Since malloc and vsnprintf rightfully take a size_t argument, I
decided to use a size_t variable for size calculation. Therefore,
this patched code uses variables of the same data types as
expected by function arguments.
Proof of concept (tested on Linux/glibc amd64):
int main() { make_message("%647s%2147483000s", "", ""); }
If the code is compiled with address sanitizer (gcc
-fsanitize=address) you can see the following line, assuming that
a signed integer overflow simply leads to INT_MIN:
==3094==WARNING: AddressSanitizer failed to allocate 0xffffffff80000000 bytes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added in kernel commit b6fb293f2497a9841d94f6b57bd2bb2cd222da43
Text from comment in include/uapi/asm-generic/mman.h.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added in kernel commit 16ba6f811dfe44bc14f7946a4b257b85476fc16e.
Text taken from comments in include/linux/mm.h.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch documents a flag added in the following kernel commit:
commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
Author: Rik van Riel <riel@redhat.com>
Date: Wed Sep 6 16:25:15 2017 -0700
mm,fork: introduce MADV_WIPEONFORK
This was already documented in man2/madvise.2 in the commit:
commit c0c4f6c29c
Author: Rik van Riel <riel@redhat.com>
Date: Tue Sep 19 20:32:00 2017 +0200
madvise.2: Document MADV_WIPEONFORK and MADV_KEEPONFORK
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The gettid() wrapper was added glibc 2.30, and is declared by
<unistd.h> if _GNU_SOURCE is defined.
Reported-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the text ("rare)" after a note from Vincent Lefèvre:
Subject: [Bug math/13932] dbl-64 pow unexpectedly slow for some inputs
Date: Sat, 23 May 2020 21:31:52 +0000
From: vincent-srcware at vinc17 dot net <sourceware-bugzilla@sourceware.org>
To: mtk.manpages@gmail.comhttps://sourceware.org/bugzilla/show_bug.cgi?id=13932
--- Comment #26 from Vincent Lefèvre <vincent-srcware at vinc17 dot net> ---
(In reply to Michael Kerrisk from comment #25)
> Fix documented for man-pages-5.07.
[...]
> -On 64-bits,
> +Before glibc 2.28,
> .\"
> .\" https://sourceware.org/bugzilla/show_bug.cgi?id=13932
> +on some architectures (e.g., x86-64)
> .BR pow ()
> may be more than 10,000 times slower for some (rare) inputs
> than for other nearby inputs.
[...]
The problematic values are uncommon, but not so rare, in the sense
that they are close to simple values, i.e. are likely to occur in
practice. An example given above: pow(0.999999999999999889, 1.5)
1 and 1.5 are very simple values, which are more likely to occur
in practice than some fixed random value. Then it suffices to have
a small rounding error on 1...
For instance, this is very different from hard-to-round cases of
exp, which are also very slow IMHO, but unless one writes a
specific program for them, no-one should notice the slowness
because such a case would typically occur only once among billions
(I don't remember the accuracy before the slowest path in this
library).
Reported-by: Vincent Lefèvre <vincent-srcware@vinc17.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This reflects glibc commit cad64f778aced84efdaa04ae64f8737b86f063ab
("ldconfig: Default to the new format for ld.so.cache").
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The terms POSIX.1-{2003,2004,2013,2016} were inventions of
my imagination, as confirmed by consulting Geoff Clare of
The Open Group. Remove these names.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Avoid implying that use of IFUNC is the only way to produce a
symbol with NULL value. Give more scenarios how a symbol may get
NULL value, but explain that in those scenarios dlsym() will fail
with Glibc's ld.so due to an implementation inconsistency.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
iproute2 allows you to specify the netns for either side of a veth
interface at creation time. Add an example of this to veth(4) so
it doesn't sound like you have to move the interfaces in a
separate step.
Verified with commands:
# ip netns add alpha
# ip netns add bravo
# ip link add foo netns alpha type veth peer bar netns bravo
# ip -n alpha link show
# ip -n bravo link show
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Adding description of new directories (/run, /usr/libexec,
/usr/share/color,/usr/share/ppd, /var/lib/color), stating
/usr/X11R6 as removed and updating URL to and version of FHS.
See https://bugzilla.kernel.org/show_bug.cgi?id=206693
Reported-by: Gary Perkins <glperkins@lit.edu>
Signed-off-by: Thomas Piekarski <t.piekarski@deloquencia.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted in email by Christian Brauner:
I forgot to mention that spawning directly into a target
cgroup is also more efficient than moving it after creation.
The specific reason is mentioned in the commit message
[ef2c41cf38a], the write lock of the semaphore need not be
taken in contrast to when it is moved afterwards. That
implementation details is not that interesting but it might
be interesting to know that it provides performance benefits
in general.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is a sequel to commit baf17bc4f2, addressing the
issues with missing commas in the middle of SEE ALSO lists that
emerged since.
The awk script from the original commit was not working and had to
be slightly modified (s/["]SEE ALSO["]/"?SEE ALSO/), otherwise it
works like a charm. Here's the fixed script and its output just
before this commit:
for f in man*/*; do
awk '
/^.SH "?SEE ALSO/ {
sa=1; print "== " FILENAME " =="; print; next
}
/^\.(PP|SH)/ {
sa=0; no=0; next
}
/^\.BR/ {
if (sa==1) {
print;
if (no == 1)
print "Missing comma in " FILENAME " +" FNR-1; no=0
}
}
/^\.BR .*)$/ {
if (sa==1)
no=1;
next
}
/\.\\"/ {next}
/.*/ {
if (sa==1) {
print; next
}
}
' $f; done | grep Missing
Missing comma in man1/memusage.1 +272
Missing comma in man2/adjtimex.2 +597
Missing comma in man2/adjtimex.2 +598
Missing comma in man2/mkdir.2 +252
Missing comma in man2/sigaction.2 +1045
Missing comma in man2/sigaction.2 +1047
Missing comma in man3/mbsnrtowcs.3 +198
Missing comma in man3/ntp_gettime.3 +142
Missing comma in man3/strcmp.3 +219
Missing comma in man3/strtol.3 +302
Missing comma in man3/wcstombs.3 +120
Missing comma in man7/user_namespaces.7 +1378
Missing comma in man7/xattr.7 +198
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
prctls that are architecture-specific won't work on other
architectures, and arch-specific prctls that manipulate optional
hardware features likewise won't work if that hardware feature is
not present.
The established pattern seems to be to treat such prctls as if they
are unimplemented, when attempted on the wrong hardware.
Cover these cases with some generic weasel words in the closet
existing EINVAL clause.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a few very minor bits of punctuation in
PR_SET_SPECULATION_CTRL and PR_GET_SPECULATION_CTRL.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The description of PR_SET_PDEATHSIG refers to "maxsig", which
is apparently intended to stand for the maximum defined signal
number.
maxsig seems not to be a thing, even in the kernel.
Reword to use the standard constant NSIG. (Discussion of SIGRTMIN
and SIGRTMAX seems out of scope here, and anyway is not relevant
to the kernel.)
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Intel MPX API was removed from Linux 5.4. See Linux
commit f240652b6032 ("x86/mpx: Remove MPX APIs")
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The prctl list has historically been sorted by prctl name (ignoring
any SET_ or GET_ prefix) to make individual prctls easier to find.
Some noise seems to have crept in since.
Sort the list back into order. Similarly, reorder the list of
prctls specified to return non-zero values on success.
Content movement only. No semantic change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The prctl.2 source is unnecessarily hard to navigate, not least
because prctl option flags are traditionally named PR_* and so look
just like prctl names.
For each actual prctl, add a comment of the form
.\" prctl PR_FOO
to make it move obvious where each top-level prctl starts.
Of course, we could add some clever macros, but let's not confuse
dumb parsers.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In reality, almost every prctl interferes with assumptions that the
compiler and C library / runtime rely on. prctl() can therefore
make userspace explode in a variety ways that are likely to be hard
to debug.
This is not obvious to the uninitiated, so add a warning.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Under PR_SET_NAME, the [tid] value seen in procfs as
/proc/self/task/[tid] is mistakenly described as the name of the
thread, whereas really the name is on /proc/self/task/[tid]/comm.
Fix it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current synopsis for prctl(2) misleadingly claims that prctl
operates on a process. Rather, some (in fact, most) prctls operate
on a thread.
The wording probably dates back to the old days when Linux didn't
really have threads at all.
Reword as appropriate.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
No content changes. Just put things in a slightly more logical
order and add a few paragraph breaks for readability.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that another reason to use the *at() APIs is to access
'flags' functionality that is not available in the corresponding
conventional APIs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
By way of a hint that the file descriptor returned by dirfd()
could usefully be fed to the *at() APIs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both functions behave the same wrt return value, no need to describe
them separately.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
arm64 is currently documented as receiving the syscall number in
x8.
While this is the correct register, the syscall number is a 32-bit
integer. Bits [63:32] are ignored by the kernel.
So it is more correct to say "w8".
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The arm OABI syscall interface is currently documented in terms of
register name aliases defined by the ARM Procedure Call Standard
(APCS). However, these don't make sense in the context of a
binary interface that doesn't comply (or need to comply) with
APCS.
Use the real architectural register names instead.
The names a1-a4, v1... are just aliases for r0-r3, r4... anyway,
so the interface is just the same regardless of which set of names
is used.
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit 96c5865559cee0f9cbc5173f3c949f6ce3525581,
the 'lo_flags' field is modifiable via the LOOP_SET_STATUS and
LOOP_SET_STATUS64 ioctl() operations.
See https://bugzilla.kernel.org/show_bug.cgi?id=203417
Reported-by: Vlad <cvazir@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are 2 typos:
file_in is used instead of fd_in in the ERRORS and NOTES sections.
file_out is used instead of fd_out in the ERRORS section.
Reported-by: Ricardo Castano <ricardo.castano.salinas@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This update addresses an issue described in
https://bugzilla.kernel.org/show_bug.cgi?id=207345
In answer to my question, Paul Eggert noted:
> Where do I find it?
https://www.iana.org/time-zones
Look under "Latest version", which is 2020a.
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Reported-by: Marco Curreli <marcocurreli@tiscali.it>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux man page for ptsname_r, when describing the behaviour
in the error case, is
- not consistent with the future POSIX standard (POSIX Issue 8).
- not consistent with musl libc.
Find attached a patch to
- keep it consistent with what glibc does,
- make it consistent with musl libc,
- make it consistent with the future POSIX standard (POSIX
Issue 8).
Details:
glibc's implementation of ptsname_r, when it fails, returns the
error code as return value AND sets errno. See
https://sourceware.org/git/?p=glibc.git;a=blob;f=login/ptsname.chttps://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/mach/hurd/ptsname.chttps://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/ptsname.c
musl's implementation of ptsname_r, when it fails, returns the error code
but does NOT set errno. See
https://git.musl-libc.org/cgit/musl/tree/src/misc/pty.c
The proposal to add ptsname_r to POSIX, with text
"If successful, the ptsname_r( ) function shall return zero.
Otherwise, an error number shall be returned to indicate the
error."
has been accepted for inclusion in POSIX Issue 8.
http://austingroupbugs.net/view.php?id=508
Therefore a portable program should look at the return value from
ptsname_r, NOT the errno value. The current text in the man page
suggests to look at the errno value, which is wrong (because of
musl libc) and not future-proof (because of future POSIX).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page of attr(1) is relevant to xattrs, therefore add it to the
SEE ALSO section.
attr(1) command works for other filesystems as well.
Signed-off-by: Achilles Gaikwad <agaikwad@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Used Bird's source code, kernel source code, iproute2 source code
and iproute2 manpages to find meanings of these new attributes.
Signed-off-by: Jan Moskyto Matejka <mq@ucw.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=198569.
Reported-by: Alexander Morozov <alexandermv@gmail.com>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Redundant because this is a Section 3 page, and the
text also describes the implementation in glibc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Paul Eggert:
From: Paul Eggert <eggert@cs.ucla.edu>
Subject: Re: Errors in man pages, here: tzfile(5): Typo?
On 4/20/20 12:27 AM, Michael Kerrisk (man-pages) wrote:
> I think "UT" here is intended to mean "Universal Time", and as such
> should not be "UTC". Perhaps Paul can comment.
Yes, that's right. The tzfile format covers timestamps that predate the
introduction of UTC in 1960, so the documentation uses the sloppier and
more-general term "UT" instead of the more-precise term "UTC".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the details of the new FAN_DIR_MODIFY event, which
introduces entry name information to the fanotify event
reporting format.
Enhance the fanotify_fid.c example to also report this event.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- The condition for printing "subdirectory created" was always
true.
- The arguments and error check of open_by_handle_at() were
incorrect.
- Fix example description inconsistencies.
- Nicer indentation of example output.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the new event types that were added in v5.1 along with
init flag FAN_REPORT_FID are not eligible for reporting to a
directory watching with FAN_EVENT_ON_CHILD.
Document the events that cannot be generated on children of a
watching parent.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is not true that FAN_MARK_MOUNT cannot be used with a group
that was initialized with flag FAN_REPORT_FID.
The correct assertion is that events that require a group with
flag FAN_REPORT_FID cannot be requested on a mark mount.
For exaple, a FAN_OPEN event can be requested on a mark mount and
will generate an event with file handle information if the group
was initialized with flag FAN_REPORT_FID.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I was experimenting with some possible changes to adjtimex(2) and
clock_adjtime(2) and tried to look up the man page to see what the
documented behavior is when I noticed that clock_adjtime() appears
to be the only system call that is currently undocumented.
Before I do any changes to it, this tries to document what I
understand it currently does.
[ RC: Add better explanations of the usage and error codes
and correct some typographical mistakes. ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux has allowed passing open file descriptors to clock_gettime()
and friends since v2.6.39. This patch documents these "dynamic"
clocks and adds a brief example of how to use them.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure in the kernel appears to be named 'dsp56k_upload'
not 'dsp56k_binary'. And this appears always to have been so.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Taken from Documentation/filesystems/proc.txt.
Reported-by: Helge Kreutzmann <debian@helgefjell.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.26, posix_spawn (2) function accepts the
POSIX_SPAWN_SETSID flag. This flag has been accepted by POSIX and
should be added to the next major revision. The current support
can be enabled with _GNU_SOURCE.
Upstream commit in glibc.git:
daeb1fa2e1 [BZ 21340] add support for POSIX_SPAWN_SETSID
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The implementation of the fork() step in posix_spawn(2) relies on
either fork(2), vfork(2) or clone(2) depending on the version of
the glibc and the arguments passed to posix_spawn(2).
It is sometimes ambiguous whether, when we are mentioning
"fork(2)", we are referring to the fork() step or the actual
fork(2) syscall.
This patch hopefully avoids the ambiguity by replacing confusing
occurrences by "the xxx() step" where appropriate.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added a few lines about POSIX_SPAWN_USEVFORK so that it appears
clearly that since glibc 2.24, the flag has no effect.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.24, the use of posix_spawn (2) makes an
unconditional call to clone(CLONE_VM | CLONE_VFORK ...) rather
than relying on fork (2) or vfork (2).
As a consequence, the statements regarding the use of the flag
POSIX_SPAWN_USEVFORK and how the function decides whether it
should use fork (2) or vfork (2) are obsolete since glibc 2.24.
This patch makes a distinction in the manual page between glibc
2.24 and older versions.
Upstream commit in glibc.git:
9ff72da471 posix: New Linux posix_spawn{p} implementation
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Taken from Linux v5.3-rc2. Add a reference to the header file to
save the future reader some time figuring out whether more entries
exist.
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In Linux 4.4, the allowed BPF helper functions that could
be called was governed by a check in sk_filter_func_proto().
Nowadays (Linux 5.6), it is I think governed by the check in
sk_filter_func_proto().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This notes that the kernel now allows calls to bpf() without CAP_SYS_ADMIN
under some circumstances.
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The cgroup.sane_behavior file returns the hard-coded value "0" and
is kept for legacy purposes. Mention this in the man-page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"), the
semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons
(usually a busy page). This was an unintentional change that
hasn't been noticed except for LTP tests which checked for the
documented behavior.
There are two ways to go around this change. We can even get back
to the original behavior and return -EAGAIN whenever migrate_pages
is not able to migrate pages due to non-fatal reasons. Another
option would be to simply continue with the changed semantic and
extend move_pages documentation to clarify that -errno is returned
on an invalid input or when migration simply cannot succeed (e.g.
-ENOMEM, -EBUSY) or the number of pages that couldn't have been
migrated due to ephemeral reasons (e.g. page is pinned or locked
for other reasons).
We decided to keep the second option in kernel because this
behavior is in place for some time without anybody complaining and
possibly new users depending on it. Also it allows to have a
slightly easier error handling as the caller knows that it is
worth to retry when err > 0.
Update man pages to reflect the new semantic.
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document that the signum argument is ignored in newer kernels, but
that user space should pass a valid real-time signal number for
backwards compatibility.
Cowritten-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified from inspection of kernel source code.
Reported-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The paragraph on Linux VM is rather generic, and does not belong
in DESCRIPTION. In fact, I'm not sure it even belongs in this
page. At the least, let's move it to NOTES.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
And while we are at it, remove a sentence that makes an obvious
point (that mremap() uses the Linux page table scheme).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Details of glibc 2.4, which is by now fairly old, would be
better at the end of NOTES than at the start.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a mailing list conversation:
On 5/24/18 9:03 PM, Heinrich Schuchardt wrote:
> Hello Michael,
>
> in the mmap(2) man page MAP_ANON is described as deprecated.
>
> When I look at the NetBSD manpage
> http://netbsd.gw.com/cgi-bin/man-cgi?mmap+2+NetBSD-current
> I found that MAP_ANONYMOUS is not defined.
>
> https://www.dragonflybsd.org/cgi/web-man?command=mmap§ion=2
> indicates MAP_ANONYMOUS is an alias for MAP_ANON and is provided for
> compatibility.
>
> https://man.openbsd.org/mmap.2 also knows MAP_ANONYMOUS as a synonym.
>
> https://www.unix.com/man-page/osx/2/mmap/ does not know MAP_ANONYMOUS.
>
> So shouldn't the man page indicate that MAP_ANON is to be favored to
> write portable code? And correspondingly mark MAP_ANONYMOUS as synonym
> only kept for compatibility.
>
> The Open Group Base Specifications Issue 7, 2018 Edition does not
> reference either of both. So both values are not POSIX but it is not
> correct to describe them as Linux only.
The text saying that MAP_ANON is deprecated is ancient (at least
20 years old). I don't know why that text was added.
Things are not simple though: it looks like there's at least
one historical implementation (HP-US) that defines MAP_ANONYMOUS
but not MAP_ANON.
I've applied the patch below.
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted by Heinrich Schuchardt:
he list contains hex values of different constants. I just wonder for
which architecture (alpha, i386, mips, or sparc at that time). No
information is supplied.
Current values depend on the architecture, e.g.
On amd64
0x82307201 VFAT_IOCTL_READDIR_BOTH
0x82307202 VFAT_IOCTL_READDIR_SHORT
0x80047210 FAT_IOCTL_GET_ATTRIBUTES
0x40047211 FAT_IOCTL_SET_ATTRIBUTES
0x80047213 FAT_IOCTL_GET_VOLUME_ID
On mips
0x42187201 VFAT_IOCTL_READDIR_BOTH
0x42187202 VFAT_IOCTL_READDIR_SHORT
0x40047210 FAT_IOCTL_GET_ATTRIBUTES
0x80047211 FAT_IOCTL_SET_ATTRIBUTES
0x40047213 FAT_IOCTL_GET_VOLUME_ID
Hence hex values should be removed.
Reported-by:
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit 337684a1746f "fs: return EPERM on immutable
inode" changed (nd unified the return value of the utimensat(2)
from -EACCES to -EPERM in case of an immutable flag. Modify the
man page to reflect the same.
The entire discussion of returning the correct return value is at:
http://lists.linux.it/pipermail/ltp/2017-January/003424.html
[mtk: The change was in Linux 4.8]
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A year cannot only begin with week number 53 of the previous year but
also with week number 52. Year 2011 is an example for this case, as
can be easily seen with GNU date:
$ date -d "jan 1 2011" "+%c %V %G"
Sat Jan 1 00:00:00 2011 52 2010
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The sysctls fs.protected_fifos and fs.protected_regular can cause
open(2) to fail with EACCES (see Documentation/sysctl/fs.rst for
details.)
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see, /proc/sys/fs/aio-max-nr is a
system-wide limit, not a per-user limit. This seems to be
confirmed by comments in fs/aio.c (Linux 5.6) sources).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The restriction to CAP_SYS_ADMIN was removed from map_files in
2015 [1]. There was a fixme that indicted this might happen, but
the main text was never updated when this commit landed. While
we're at it, add a note about the ptrace access check that is
still required.
[1] bdb4d100af
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The info here is mostly ancient, certainly incomplete,
and is not consistently maintained. Best to remove it, I think.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Current code ignores the MPOL_MF_STRICT when handling hugetlb
mapping, now patch([1]) handles MPOL_MF_STRICT in same semantic as
other mapping. So, we can remove the note about 'MPOL_MF_STRICT
is ignored on huge page mappings', and no changes to other part of
man-page.
[1] https://lore.kernel.org/linux-mm/1581559627-6206-1-git-send-email-lixinhai.lxh@gmail.com/
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The display of the /proc/PID/ns renders very wide. Make it
narrower by eliminating some nonessential info via some
awk(1) filtering.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Andrei Vagin implemented a change I suggested:
clock-IDs are now be expressed in symbolic form (e.g.,
"monotonic") instead of numeric form (e.g., 1) when reading
/proc/PID/timerns_offsets, and can be expressed either
symbolically or numerically when writing to that file.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, note the ERANGE restrictions reported by
Thomas Gleixner.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Andrew Micallef <andrew.micallef@live.com.au>
Reported-by: Walter Harms <wharms@bfs.de>
Reviewed-by: Andrew Micallef <andrew.micallef@live.com.au>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The cmdline file is a window into memory that is controlled by the
target process, and that memory may be changed arbitrarily, as can
the window via prctl settings. Make sure people understand that
this file is all an illusion.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A new feature of one-shoot polling through io_submit was
introduced in bfe4037e722ec commit. Keep things up-to-date due
to changes in linux/aio_abi.h.
Signed-off-by: Julia Suvorova <jusual@mail.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
signal.7: Which signal is delivered in response to a CPU exception
is under-documented and does not always make sense. See
<https://bugzilla.kernel.org/show_bug.cgi?id=205831> for an
example where it doesn’t make sense; per the discussion there,
this cannot be changed because of backward compatibility concerns,
so let’s instead document the problem.
sigaction.2: For related reasons, the kernel doesn’t always fill
in all of the fields of the siginfo_t when delivering signals from
CPU exceptions. Document this as well. I imagine this one
_could_ be fixed, but the problem would still be relevant to
anyone using an older kernel.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After a comment from Matthew Bobrowski:
Although, I would just have to point out that it doesn't
necessarily have to be a "script" file, but rather a file of
any type that can have its contents interpreted, which then
results in a form of program execution i.e.
$ /usr/lib64/ld-linux-x86-64.so.2 ./foo
In this case, foo is not a "script" file.
Reported-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Back in 2011, a mail from Andrea Arcangeli noted some details
that I never got round to incorporating into the manual page.
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This subcommand was added a few years ago to support cpuid emulation
on x86 targets, but no changes to the man page appear to have been
made at the time. This commit adds a description for it and the
corresponding getter.
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The example is misleading. It is not a good idea to unlink an
existing socket because we might try to start the server multiple
times. In this case it is preferable to receive an error.
We could add code that removes the socket when the server process
is killed but that would stretch the example too far.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update the details on AF_UNSPEC and circumstances in which
socket can be reconnected.
From a mail conversation with Eric Dumazet:
> connect() man page seems obsolete or confusing :
>
> Generally, connection-based protocol sockets may successfully
> connect() only once; connectionless protocol sockets may use
> connect() multiple times to change their association.
> Connectionless sockets may dissolve the association by connecting to
> an address with the sa_family member of sockaddr set to AF_UNSPEC
> (supported on Linux since kernel 2.2).>
>
> 1) At least TCP has supported AF_UNSPEC thing forever.
> 2) By definition connectionless sockets do not have an association,
> why would they call connect(AF_UNSPEC) to remove a connection
> which does not exist ...
Calling connect() on a connectionless socket serves two purposes:
a) Assigns a default outgoing address for datagrams (sent using write(2)).
b) Causes datagrams sent from sources other than the peer address to be
discarded.
Both of these things are true in AF_UNIX and the Internet domains.
Using connect(AF_UNSPEC) allows the local datagram socket to clear
this association (without having to connect() to a *different*
peer), so that now it can send datagrams to any peer and receive
datagrams for any peer, (I've just retested all of this.)
>
> Maybe we should rewrite this paragraph to match reality, since
> this causes confusion.
>
>
> Some protocol sockets may successfully connect() only once.
> Some protocol sockets may use connect() multiple times to change
> their association.
> Some protocol sockets may dissolve the association by connecting to
> an address with the sa_family member of sockaddr set to AF_UNSPEC
> (supported on Linux since kernel 2.2).
When I first saw your note, I was afraid that I had written
the offending text. But, I see it has been there since the
manual page was first added in 1992 (other than the piece
"(supported since on Linux since kernel 2.2)", which I added in
2007). Perhaps it was true in 1992.
Anyway, I confirm your statement about TCP sockets. The
connect(AF_UNSPEC) thing works; thereafter, the socket may be
connected to another socket.
Interestingly, connect(AF_UNSPEC) does not seem to work for
UNIX domain stream sockets. (My light testing gives an EINVAL
error on connect(AF_UNSPEC) of an already connected UNIX stream
socket. I could not easily spot where this error was being
generated in the kernel though.)
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note the kernel version that added SO_TIMESTAMPNS,
and (from the kernel commit) note tha SO_TIMESTAMPNS and
SO_TIMESTAMP are mutually exclusive.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
===========
DESCRIPTION
===========
I added a paragraph for ``SO_TIMESTAMP``, and modified the
paragraph for ``SIOCGSTAMP`` in relation to ``SO_TIMESTAMPNS``.
I based the documentation on the existing ``SO_TIMESTAMP``
documentation, and
on my experience using ``SO_TIMESTAMPNS``.
I asked a question on stackoverflow, which helped me understand
``SO_TIMESTAMPNS``:
https://stackoverflow.com/q/60971556/6872717
Testing of the feature being documented
=======================================
I wrote a simple server and client test.
In the client side, I connected a socket specifying
``SOCK_STREAM`` and ``"tcp"``.
Then I enabled timestamp in ns:
.. code-block:: c
int enable = 1;
if (setsockopt(sd, SOL_SOCKET, SO_TIMESTAMPNS, &enable,
sizeof(enable)))
goto err;
Then I prepared the msg header:
.. code-block:: c
char buf[BUFSIZ];
char cbuf[BUFSIZ];
struct msghdr msg;
struct iovec iov;
memset(buf, 0, ARRAY_BYTES(buf));
iov.iov_len = ARRAY_BYTES(buf) - 1;
iov.iov_base = buf;
msg.msg_name = NULL;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = cbuf;
msg.msg_controllen = ARRAY_BYTES(cbuf);
And got some times before and after receiving the msg:
.. code-block:: c
struct timespec tm_before, tm_recvmsg, tm_after, tm_msg;
clock_gettime(CLOCK_REALTIME, &tm_before);
usleep(500000);
clock_gettime(CLOCK_REALTIME, &tm_recvmsg);
n = recvmsg(sd, &msg, MSG_WAITALL);
if (n < 0)
goto err;
usleep(1000000);
clock_gettime(CLOCK_REALTIME, &tm_after);
After that I read the timestamp of the msg:
.. code-block:: c
struct cmsghdr *cmsg;
for (cmsg = CMSG_FIRSTHDR(&msg); cmsg;
cmsg = CMSG_NXTHDR(&msg, cmsg)) {
if (cmsg->cmsg_level == SOL_SOCKET &&
cmsg->cmsg_type == SO_TIMESTAMPNS) {
memcpy(&tm_msg, CMSG_DATA(cmsg), sizeof(tm_msg));
break;
}
}
if (!cmsg)
goto err;
And finally printed the results:
.. code-block:: c
double tdiff;
printf("%s\n", buf);
tdiff = timespec_diff_ms(&tm_before, &tm_recvmsg);
printf("tm_r - tm_b = %lf ms\n", tdiff);
tdiff = timespec_diff_ms(&tm_before, &tm_after);
printf("tm_a - tm_b = %lf ms\n", tdiff);
tdiff = timespec_diff_ms(&tm_before, &tm_msg);
printf("tm_m - tm_b = %lf ms\n", tdiff);
Which printed:
::
asdasdfasdfasdfadfgdfghfthgujty 6, 0;
tm_r - tm_b = 500.000000 ms
tm_a - tm_b = 1500.000000 ms
tm_m - tm_b = 18.000000 ms
System:
::
Linux debian 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64
GNU/Linux
gcc (Debian 9.3.0-8) 9.3.0
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Matthew Wilcox:
The current text of the lseek manpage is ambiguous about
the behaviour of lseek(SEEK_DATA) for a file which is
entirely a hole (or the end of the file is a hole and the
pos lies within the hole). The draft POSIX language is
specific (ENXIO is returned when whence is SEEK_DATA and
offset lies within the final hole of the file). Could I
trouble you to wordsmith that in?
If you want to look at the draft POSIX text, it's here:
https://www.austingroupbugs.net/view.php?id=415
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See Linux source as of v5.4:
kernel/time/posix-clock.c
Signed-off-by: Eric Rannaud <e@nanocritical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email discussions with Thomas Gleixner:
======
Hello Thomas, et al,
Following on from our discussion of read() on a timerfd [1], I
happened to remember a Debian bug report [2] that points out that
timer_settime() can fail with the error ECANCELED, which is both
surprising and odd (because despite the error, the timer does get
updated).
The relevant kernel code (I think, from your commit [3]) seems to be
the following in timerfd_setup():
if (texp != 0) {
if (flags & TFD_TIMER_ABSTIME)
texp = timens_ktime_to_host(clockid, texp);
if (isalarm(ctx)) {
if (flags & TFD_TIMER_ABSTIME)
alarm_start(&ctx->t.alarm, texp);
else
alarm_start_relative(&ctx->t.alarm, texp);
} else {
hrtimer_start(&ctx->t.tmr, texp, htmode);
}
if (timerfd_canceled(ctx))
return -ECANCELED;
}
Using a small test program [4] shows the behavior. The program loops,
repeatedly calling timerfd_settime() (with a delay of a few seconds
before each call). In another terminal window, enter the following
command a few times:
$ sudo date -s "5 seconds" # Add 5 secs to wall-clock time
I see behavior as follows (the /sudo date -s "5 seconds"/ command was
executed before loop iterations 0, 2, and 4):
[[
$ ./timerfd_settime_ECANCELED
0
Current time is 1585729978 secs, 868510078 nsecs
Timer value is now 0 secs, 0 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999991977 nsecs
1
Current time is 1585729982 secs, 716339545 nsecs
Timer value is now 6 secs, 152167990 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999992940 nsecs
2
Current time is 1585729991 secs, 567377831 nsecs
Timer value is now 1 secs, 148959376 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999976294 nsecs
3
Current time is 1585729995 secs, 405385503 nsecs
Timer value is now 6 secs, 161989917 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999993317 nsecs
4
Current time is 1585730004 secs, 225036165 nsecs
Timer value is now 1 secs, 180346909 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999984345 nsecs
]]
I note from the above.
(1) If the wall-clock is changed before the first timerfd_settime()
call, the call succeeds. This is of course expected.
(2) If the wall-clock is changed after a timerfd_settime() call, then
the next timerfd_settime() call fails with ECANCELED.
(3) Even if the timerfd_settime() call fails, the timer is still updated(!).
Some questions:
(a) What is the rationale for timerfd_settime() failing with ECANCELED
in this case? (Currently, the manual page says nothing about this.)
(b) It seems at the least surprising, but more likely a bug, that
timerfd_settime() fails with ECANCELED while at the same time
successfully updating the timer value.
Your thoughts?
Thanks,
Michael
[1] https://lore.kernel.org/lkml/3cbd0919-c82a-cb21-c10f-0498433ba5d1@gmail.com/
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947091
[3]
commit 99ee5315dac6211e972fa3f23bcc9a0343ff58c4
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Apr 27 14:16:42 2011 +0200
timerfd: Allow timers to be cancelled when clock was set
[4]
/* timerfd_settime_ECANCELED.c */
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <inttypes.h>
#include <sys/timerfd.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
int
main(int argc, char *argv[])
{
struct itimerspec ts, gts;
struct timespec start;
int tfd = timerfd_create(CLOCK_REALTIME, 0);
if (tfd == -1)
errExit("timerfd_create");
ts.it_interval.tv_sec = 0;
ts.it_interval.tv_nsec = 10;
int flags = TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET;
for (long j ; ; j++) {
/* Inject a delay into each loop, by calling getppid()
many times */
for (int k = 0; k < 10000000; k++)
getppid();
if (j % 1 == 0)
printf("%ld\n", j);
/* Display the current wall-clock time */
if (clock_gettime(CLOCK_REALTIME, &start) == -1)
errExit("clock_gettime");
printf("Current time is %ld secs, %ld nsecs\n",
start.tv_sec, start.tv_nsec);
/* Before resetting the timer, retrieve its current value
so that after the timerfd_settime() call, we can see
whether the the value has changed */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
/* Reset the timer to now + 10 secs */
ts.it_value.tv_sec = start.tv_sec + 10;
ts.it_value.tv_nsec = start.tv_nsec;
if (timerfd_settime(tfd, flags, &ts, NULL) == -1)
perror("timerfd_settime");
else
printf("timerfd_settime() succeeded\n");
/* Display the timer value once again */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
printf("\n");
}
}
=======
Subject: Re: timer_settime() and ECANCELED
Date: Wed, 01 Apr 2020 19:42:42 +0200
From: Thomas Gleixner <tglx@linutronix.de>
Michael,
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Following on from our discussion of read() on a timerfd [1], I
> happened to remember a Debian bug report [2] that points out that
> timer_settime() can fail with the error ECANCELED, which is both
> surprising and odd (because despite the error, the timer does get
> updated).
...
> (1) If the wall-clock is changed before the first timerfd_settime()
> call, the call succeeds. This is of course expected.
> (2) If the wall-clock is changed after a timerfd_settime() call, then
> the next timerfd_settime() call fails with ECANCELED.
> (3) Even if the timerfd_settime() call fails, the timer is still updated(!).
>
> Some questions:
> (a) What is the rationale for timerfd_settime() failing with ECANCELED
> in this case? (Currently, the manual page says nothing about this.)
> (b) It seems at the least surprising, but more likely a bug, that
> timerfd_settime() fails with ECANCELED while at the same time
> successfully updating the timer value.
Really good question and TBH I can't remember why this is implemented in
the way it is, but I have a faint memory that at least (a) is
intentional.
After staring at the code for a while I came up with the following
answers:
(a): If the clock was set event ("date -s ...") which triggered the
cancel was not yet consumed by user space via read(), then that
information would get lost because arming the timer to the new
value has to reset the state.
(b): Arming the timer in that case is indeed very questionable, but it
could be argued that because the clock was set event happened with
the old expiry value that the new expiry value is not affected.
I'd be happy to change that and not arm the timer in the case of a
pending cancel, but I fear that some user space already depends on
that behaviour.
Thanks,
tglx
======
Subject: Re: timer_settime() and ECANCELED
Date: Thu, 02 Apr 2020 10:49:18 +0200
From: Thomas Gleixner <tglx@linutronix.de>
To: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> On 4/1/20 7:42 PM, Thomas Gleixner wrote:
>> (b): Arming the timer in that case is indeed very questionable, but it
>> could be argued that because the clock was set event happened with
>> the old expiry value that the new expiry value is not affected.
>>
>> I'd be happy to change that and not arm the timer in the case of a
>> pending cancel, but I fear that some user space already depends on
>> that behaviour.
>
> Yes, that's the risk, of course. So, shall we just document all
> this in the manual page?
I think so.
Thanks,
tglx
======
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
prctl commands added to the linux kernel for 5.6 in commit:
commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
Author: Mike Christie <mchristi@redhat.com>
Date: Mon Nov 11 18:19:00 2019 -0600
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
Reviewed-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mostly verified by testing and reading the code.
There is unfortunately quite a bit of inconsistency across API~s:
clock_gettime clock_settime clock_nanosleep timer_create timerfd_create
CLOCK_BOOTTIME y n (EINVAL) y y y
CLOCK_BOOTTIME_ALARM y n (EINVAL) y [1] y [1] y [1]
CLOCK_MONOTONIC y n (EINVAL) y y y
CLOCK_MONOTONIC_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_MONOTONIC_RAW y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_REALTIME y y y y y
CLOCK_REALTIME_ALARM y n (EINVAL) y [1] y [1] y [1]
CLOCK_REALTIME_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_TAI y n (EINVAL) y y n (EINVAL)
CLOCK_PROCESS_CPUTIME_ID y n (EINVAL) y y n (EINVAL)
CLOCK_THREAD_CPUTIME_ID y n (EINVAL) n (EINVAL [2]) y n (EINVAL)
pthread_getcpuclockid() y n (EINVAL) y y n (EINVAL)
[1] The caller must have CAP_WAKE_ALARM, or the error EPERM results.
[2] This error is generated in the glibc wrapper.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Presumably (and from a quick glance at the source code)
since Linux 3.10. when CLOCK_TAI was introduced.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Presumably (and from a quick glance at the source code)
since Linux 2.6.39, when CLOCK_BOOTTIME was introduced.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email discussions with Aleksa Sarai:
> .\" FIXME I find the "previously-functional systems" in the previous
> .\" sentence a little odd (since openat2() ia new sysycall), so I would
> .\" like to clarify a little...
> .\" Are you referring to the scenario where someone might take an
> .\" existing application that uses openat() and replaces the uses
> .\" of openat() with openat2()? In which case, is it correct to
> .\" understand that you mean that one should not just indiscriminately
> .\" add the RESOLVE_NO_XDEV flag to all of the openat2() calls?
> .\" If I'm not on the right track, could you point me in the right
> .\" direction please.
This is mostly meant as a warning to hopefully avoid applications
because the developer didn't realise that system paths may contain
symlinks or bind-mounts. For an application which has switched to
openat2() and then uses RESOLVE_NO_SYMLINKS for a non-security reason,
it's possible that on some distributions (or future versions of a
distribution) that their application will stop working because a system
path suddenly contains a symlink or is a bind-mount.
This was a concern which was brought up on LWN some time ago. If you can
think of a phrasing that makes this more clear, I'd appreciate it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Devi R K reported this issue, and went on to note:
> We have written a program using real time clock and it has been raised to
> the community.
>
> https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/
[...]
Thanks for pointing me at that thread. In particular, the test
program at
https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m489d81abdfbb2699743e18c37657311f8d52a4cd
[...]
I think this patch does not really capture the details
properly. The immediately preceding paragraph says:
If the associated clock is either CLOCK_REALTIME or
CLOCK_REALTIME_ALARM, the timer is absolute
(TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET
was specified when calling timerfd_settime(), then read(2)
fails with the error ECANCELED if the real-time clock
undergoes a discontinuous change. (This allows the reading
application to discover such discontinuous changes to the
clock.)
Following on from that, I think we should have a paragraph that says
something like:
If the associated clock is either CLOCK_REALTIME or
CLOCK_REALTIME_ALARM, the timer is absolute
(TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET
was not specified when calling timerfd_settime(), then a
discontinuous negative change to the clock
(e.g., clock_settime(2)) may cause read(2) to unblock, but
return a value of 0 (i.e., no bytes read), if the clock
change occurs after the time expired, but before the
read(2) on the timerfd file descriptor.
This seems consistent with Thomas's observations in
https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m49b78122b573a2749a05b720dc9fa036546db490
==
Thomas Gleixner replied:
Yes, that's correct. Accurate as always!
This is pretty much in line with clock_nanosleep(CLOCK_REALTIME,
TIMER_ABSTIME) which has a similar problem vs. observability in user
space.
clock_nanosleep(2) mutters:
"POSIX.1 specifies that after changing the value of the CLOCK_REALTIME
clock via clock_settime(2), the new clock value shall be used to
determine the time at which a thread blocked on an absolute
clock_nanosleep() will wake up; if the new clock value falls past the
end of the sleep interval, then the clock_nanosleep() call will return
immediately."
which can be interpreted as guarantee that clock_nanosleep() never
returns prematurely, i.e. the assert() in the below code would indicate
a kernel failure:
ret = clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &expiry, NULL);
if (!ret) {
clock_gettime(CLOCK_REALTIME, &now);
assert(now >= expiry);
}
But that assert can trigger when CLOCK_REALTIME was modified after the
timer fired and the kernel decided to wake up the task and let it return
to user space.
clock_nanosleep(..., &expiry)
arm_timer(expires);
schedule();
-> timer interrupt
now = ktime_get_real();
if (expires <= now)
-------------------------------- After this point
wakeup(); clock_settime(2) or
adjtimex(2) which
makes CLOCK_REALTIME
jump back far enough will
cause the above assert
to trigger.
...
return from syscall (retval == 0)
There is no guarantee against clock_settime() coming after the
wakeup. Even if we put another check into the return to user path then
we won't catch a clock_settime() which comes right after that and before
user space invokes clock_gettime().
POSIX spec Issue 7 (2018 edition) says:
The suspension for the absolute clock_nanosleep() function (that is,
with the TIMER_ABSTIME flag set) shall be in effect at least until the
value of the corresponding clock reaches the absolute time specified by
rqtp.
And that's what the kernel implements for clock_nanosleep() and timerfd
behaves exactly the same way.
The wakeup of the waiter, i.e. task blocked in clock_nanosleep(2),
read(2), poll(2), is not happening _before_ the absolute time specified
is reached.
If clock_settime() happens right before the expiry check, then it does
the right thing, but any modification to the clock after the wakeup
cannot be mitigated. At least not in a way which would make the assert()
in the example code above a reliable indicator for a kernel fail.
That's the reason why I rejected the attempt to mitigate that particular
0 tick issue in timerfd as it would just scratch a particular itch but
still not provide any guarantee. So having the '0' return documented is
the right way to go.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: devi R.K <devi.feb27@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For me, source lines such as:
.BR perf_setattr "(2), " perf_event_open "(2), and " clone3 (2).
is harder to read than:
.BR perf_setattr (2),
.BR perf_event_open (2),
and
.BR clone3 (2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are currently three of these forward references (two in
DESCRIPTION, one in ERRORS). This is a little redundant.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than trying to merge the new syscall documentation into
open.2 (which would probably result in the man-page being
incomprehensible), instead the new syscall gets its own dedicated
page with links between open(2) and openat2(2) to avoid
duplicating information such as the list of O_* flags or common
errors.
In addition to describing all of the key flags, information about
the extensibility design is provided so that users can better
understand why they need to pass sizeof(struct open_how) and how
their programs will work across kernels. After some discussions
with David Laight, I also included explicit instructions to zero
the structure to avoid issues when recompiling with new headers.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The fact that CLOCK_PROCESS_CPUTIME_ID and
CLOCK_PROCESS_CPUTIME_ID are not settable isn't a bug,
since POSIX does allow the possibility that these clocks
are not settable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The sixth argument of futex is uaddr2, instead of uaddr.
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 5.6 added the new well-known VMADDR_CID_LOCAL for
local communication.
This patch explains how to use it and removes the legacy
VMADDR_CID_RESERVED no longer available.
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add example programs demonstrating usage of shmget(2), shmat(2),
semget(2), semctl(2), and semop(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the verity attribute for statx(), which was added in
Linux 5.5.
For more context, see the fs-verity documentation:
https://www.kernel.org/doc/html/latest/filesystems/fsverity.html
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix clone3() syscall description for CLONE_PARENT_SETTID: kernel uses
cl_args.parent_tid instead of the specified cl_args.child_tid.
Signed-off-by: Krzysztof Małysa <varqox@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a '.RE' macro to terminate the last .RS block.
There is no change in the output.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man3/strftime.3 (%C): Describe the meaning of %EC conversion
specification.
(%E): Mention the concept of "era" in description.
(%O): Mention that alternative format is related to numeric
representation.
(%y): Describe the meaning of %Ey conversion specification.
(%Y): Describe the meaning of %EY conversion specification.
(.SH DESCRIPTION): Mention that the behaviour of %E modifier is governed
by ERA locale element and provide ja_JP locale as an example.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As it wasn't clear before where this kind of information can be
obtained from.
* man3/strftime.3 (%a, %A, %b, %B, %c, %p, %r, %x, %X): Add information
about the locale elements that can be used to retrieve the relevant
information using nl_langinfo() library call.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Consecutive calls to clock_gettime(CLOCK_MONOTONIC) are guaranteed
to return MONOTONIC values, which means that they either return
the *SAME* time value like the last call, or a later (higher) time
value.
Due to high resolution counters, like TSC on x86, most people see
that the values returned increase, but on other less common
platforms it's less likely that consecutive calls return newer
values, and instead users may unexpectedly get back the SAME time
value.
I think it makes sense to document that people should not expect
to see "always-growing" time values. For example in Debian I've
seen in quite some source packages where return values of
consecutive calls are compared against each other and then the
package build fails if they are equal (e.g. ruby-hitimes, ...).
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/syscalls.2 (.SH DESCRIPTION) <\fBgetdtablesize\fP(2)>: Remove "since
Linux 2.0" part for the osf_getdtablesize note, as syscall is generally
available since Linux 2.0; add line break after the word "as".
(.SH DESCRIPTION) <\fBpwrite\fP(2)>: Add line breaks.
(.SH DESCRIPTION) <\fBvm86old\fP(2)>: Add a line break after "in".
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In many cases, these don't improve readability, and (when stacked)
they sometimes have the side effect of sometimes forcing text
to be justified within a narrow column range.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
And add self to copyright, since, by now, the majority of the
text in the page has now been (re)written by me.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improve structure and readability, at the same time incorporating
text and details that were formerly in select_tut(2). Also
move a few details in other parts of the page into DESCRIPTION.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The POSIX situation has been the norm for a long time now,
and including ancient details overcomplicates the page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The discussion about pre-POSIX types for 'timeval' and 'timespec'
is rather old, and these days serves mainly to complicate the
page. Remove it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text layout is a little hard to parse, with details of
pselect() spread in the main description. Move some of that text
to a headed subsection, and add a one-sentence introduction
describing the purpose of pselect().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One might be tempted to think that realloc() always requests a new
allocation before moving the contents over (at least in the case
where the new size is bigger than the original). This is not the
case; for example, on my system the following program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
void *x = malloc(15);
void *y = malloc(32);
printf("x = %p\n", x);
printf("y = %p\n", y);
printf("usable_size(x) = %lu\n", malloc_usable_size(x));
void *z = realloc(x, 24);
printf("z = %p\n", z);
return 0;
}
prints:
x = 0x1b3a010
y = 0x1b3a030
usable_size(x) = 24
z = 0x1b3a010
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Admittedly, the POSIX specification for exit() also uses octal.
However, 0xFF immediately indicates the lowest 8 bits to me
whereas I had to think a bit about the octal mask.
Cowritten-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This was already shown in an earlier version of the page,
but Adam Borowski's patch replaced it with an alternative.
Probably, it is better to show both possibilities.
Reported-by: "Joseph C. Sible" <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the example snippet, we already have the fd, thus there's no
need to refer to the file by name. And, /proc/ might be not
mounted or not accessible.
Noticed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Supported since fadb4244085cd04fd9c8b3a4b3bc161f506431f3 (4.9),
100..107 are supposed to be bright but this does not yet work
(unmerged patches to do so exist).
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Supported since cec5b2a97a11ade56a701e83044d0a2a984c67b4 (3.16).
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since 65d9982d7e523a1a8e7c9af012da0d166f72fc56 (4.17), it follows
xterm rather than common sense and consistency, being the only
command 1..9 where N+20 doesn't undo what N did. As libvte
0.51.90 got changed the same way, this behaviour will probably
stay.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From an email by Rich Felker:
It came to my attention while reviewing possible breakage with
move to 64-bit time_t that some applications are dereferencing
data in socket control messages (particularly SCM_TIMESTAMP*)
in-place as the message type, rather than memcpy'ing it to
appropriate storage. This necessarily does not work and is not
supportable if the message contains data with greater alignment
requirement than the header. In particular, on 32-bit archs,
cmsghdr has size 12 and alignment 4, but struct timeval and
timespec may have alignment requirement 8.
I found at least ptpd, socat, and ssmping doing this via Debian
Code Search:
https://sources.debian.org/src/ptpd/2.3.1-debian1-4/src/dep/net.c/?hl=1578#L1578https://sources.debian.org/src/socat/1.7.3.3-2/xio-socket.c/?hl=1839#L1839https://sources.debian.org/src/ssmping/0.9.1-3/ssmpngcl.c/?hl=307#L307
and I suspect there are a good deal more out there. On most archs
they won't break, or will visibly break with SIGBUS, but in theory
it's possible that they silently read wrong data and this might
happen on some older and more tiny-embedded-oriented archs.
I think it's clear to someone who understands alignment and who's
thought about it that applications just can't do this, but it
doesn't seem to be documented, and an example in cmsg(3) even
shows access to int payload via *(int *)CMSG_DATA(cmsg) (of course
int is safe because its alignment is <= header alignment, but this
is not mentioned).
Could we add text, and perhaps change the example, to indicate
that in general memcpy needs to be used to copy the payload
to/from a suitable object?
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PTRACE_EVENT_STOP does not always report SIGTRAP, can be the
signal which stopped us
While at it, fix an obvious copy/paste error in
PTRACE_GET_SYSCALL_INFO description.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Somewhat surprisingly, perf_event_open() can fail with EINTR when
trying to enable perf reporting for a uprobe that's already been
configured for use with ftrace. Mention this error in the man
page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch is a minor wording fix in getcwd.3 that changes "In the case getcwd()" to "In the case of getcwd()". This patch should apply cleanly to the master branch of the git repository.
Regards,
Mike Salvatore
From 3b68ad225dbaada2b1b55153dc57807b04531cd6 Mon Sep 17 00:00:00 2001
From: Mike Salvatore <mike.salvatore@canonical.com>
Date: Thu, 16 Jan 2020 16:08:08 -0500
Subject: [PATCH] getcwd.3: wfix
Signed-off-by: Mike Salvatore <mike.salvatore@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The man page contains a trivial bug that's discussed here:
https://stackoverflow.com/q/59628958
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fixes: d8d701003 ("malloc.3: Since glibc 2.29, realloc() is exposed by
defining _DEFAULT_SOURCE")
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The glibc implementation of getpt has actually never been setting
O_NOCTTY when opening /dev/ptmx or BSD ptys.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit e78bbfa82624 ("mm: stop returning -ENOENT from
sys_move_pages() if nothing got migrated") had the effect of
*never* returning -ENOENT, in any situation. So we need to update
the man page to reflect that ENOENT is not a possible return
value.
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PVS-Studio reports that in
char buf[8192];
/* ... */
nh = (struct nlmsghdr *) buf,
the pointer 'buf' is cast to a more strictly aligned pointer type.
This is undefined behaviour. One possible solution to make sure
that buf is correctly aligned is to declare buf as an array of
struct nlmsghdr. Other solutions include allocating the array on
the heap, use an union, or stdalign features. With this patch,
the buffer still contains 8192 bytes.
This was raised on Stack Overflow:
https://stackoverflow.com/questions/57745580/netlink-receive-buffer-alignment
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Historically (before Linux 2.6.23), base_addr was unsigned long
for 32-bit code and unsigned int for 64-bit code. In other words,
it was always a 32-bit value. When the ldt.h header files were
unified, the type became unsigned int on all systems. Update
modify_ldt.2 and set_thread_area.2 accordingly.
Indeed, on x86, the GDT and LDT specify 32-bit bases for code and
data segments, and this has nothing to do with the kernel.
Reported-by: "Metzger, Markus T" <markus.t.metzger@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The programmer should not need to care about the numeric values,
and their inclusion is verbosity.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit 3dd4d40b4208("xfs: Sanity check flags
of Q_XQUOTARM call"), it has added flags check. If it is
not usr,grp,prj quota type, it will report EINVAL.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The definition of the tpacket_auxdata struct in the manpage is not
the same as the definition found in
/include/uapi/linux/if_packet.h.
In particular, instead of a tp_padding field, there is a
tp_vlan_tpid field. An example of a project using this field is
libpcap[1].
[1]: https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-linux.c#L349
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added two missing parentheses
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This surely meant to say clone3() and not clone(3).
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The CLONE_PARENT flag cannot but used by init processes. Let's mention
this in the manpages to prevent surprises.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The introductory paragraphs note that "the calling process" is
normally synonymous with the "the parent process", except in the
case of CLONE_PARENT. The same is also true of CLONE_THREAD.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure 'struct sockaddr_vm' has additional element
'unsigned char svm_zero[]' since version v3.9-rc1
(include/uapi/linux/vm_sockets.h). Linux kernel checks that this
element is zeroed (net/vmw_vsock/vsock_addr.c). Reflect this on
the vsock man page.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=205583
Signed-off-by: Mikhail Golubev <Mikhail.Golubev@opensynergy.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Advertise to userspace that they should use proper pid_t types
for arguments returning a pid.
The kernel-internal struct kernel_clone_args currently uses int
as type and since POSIX mandates that pid_t is a signed integer
type and glibc and friends use int this is not an issue. After
the merge window for v5.5 closes we can switch struct
kernel_clone_args over to using pid_t as well without any danger
in regressing current userspace.
Also note, that the new set tid feature which will be merged for
v5.5 uses pid_t types as well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If mmap() fails it will return MAP_FAILED which according to the manpage
is (void *)-1 not NULL.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix two spelling mistakes in manpage describing the clone{2,3}()
syscalls/syscall wrappers.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reword a little to allow for the fact that there are now
*two* reasons to consider using this flag.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christian Brauner suggested mmap(MAP_STACKED), rather than
malloc(), as the canonical way of allocating a stack for the
child of clone(), and Jann Horn noted some reasons why:
Not on Linux, but on OpenBSD, they do use MAP_STACK now
AFAIK; this was announced here:
<http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.html>.
Basically they periodically check whether the userspace
stack pointer points into a MAP_STACK region, and if not,
they kill the process. So even if it's a no-op on Linux, it
might make sense to advise people to use the flag to improve
portability? I'm not sure if that's something that belongs
in Linux manpages.
Another reason against malloc() is that when setting up
thread stacks in proper, reliable software, you'll probably
want to place a guard page (in other words, a 4K PROT_NONE
VMA) at the bottom of the stack to reliably catch stack
overflows; and you probably don't want to do that with
malloc, in particular with non-page-aligned allocations.
And the OpenBSD 6.5 manual pages says:
MAP_STACK
Indicate that the mapping is used as a stack. This
flag must be used in combination with MAP_ANON and
MAP_PRIVATE.
And I then noticed that MAP_STACK seems already to be on
FreeBSD for a long time:
MAP_STACK
Map the area as a stack. MAP_ANON is implied.
Offset should be 0, fd must be -1, and prot should
include at least PROT_READ and PROT_WRITE. This
option creates a memory region that grows to at
most len bytes in size, starting from the stack
top and growing down. The stack top is the start‐
ing address returned by the call, plus len bytes.
The bottom of the stack at maximum growth is the
starting address returned by the call.
The entire area is reserved from the point of view
of other mmap() calls, even if not faulted in yet.
Reported-by: Jann Horn <jannh@google.com>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The obsolete CLONE_DETACHED flag has never been properly
documented, but now the discussion CLONE_PIDFD also requires
mention of CLONE_DETACHED. So, properly document CLONE_DETACHED,
and mention its interactions with CLONE_PIDFD.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Sometimes the descriptions of these flags mentioned the
corresponding section 7 namespace manual page and then the
required capabilities, and sometimes the order was the was
the reverse. Make it consistent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove details of UTS, IPC, and network namespaces that are
already covered in the corresponding namespaces pages in
section 7. This change is for consistency, since corresponding
details were not provided for other namespace types in clone(2)
and these details do not appear in unshare(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After feedback from Christian Brauner [1], I've adjusted a few pieces
of the clone3() text, and also adjusted some of the older text in
the page.
[1] https://lore.kernel.org/linux-man/20191107151941.dw4gtul5lrtax4se@wittgenstein/
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change the text in the introductory paragraph (which was written
20 years ago) to reflect the fact that clone*() does more things
nowadays.
Cowritten-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Adjust references to namespaces(7) to be references to pages
describing specific namespace types.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For Q_QUOTAON, on old kernel we can use quotacheck -ug to generate
quota files. But in current kernel, we can also hide them in
system inodes and indicate them by using "quota" or project
feature.
For user or group quota, we can do as below (etc ext4):
mkfs.ext4 -F -o quota /dev/sda5
mount /dev/sda5 /mnt
quotactl(QCMD(Q_QUOTAON, USRQUOTA), /dev/sda5, QFMT_VFS_V0, NULL);
For project quota, we can do as below (etc ext4):
mkfs.ext4 -F -o quota,project /dev/sda5
mount /dev/sda5 /mnt
quotactl(QCMD(Q_QUOTAON, PRJQUOTA), /dev/sda5, QFMT_VFS_V0, NULL);
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the given example, the second recvmsg(2) call should receive four bytes,
as the third sendmsg(2) call only sends four.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "flags mask" as a generic term to refer to the clone()
'flags' argument and the clone3() 'cl_args.flags' field.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Sometime soon, we'll have to add documentation of clone3() to this
page. As a preparatorys step, make the names of the clone()
arguments the same as the fields in the clone3() 'args' struct:
ctid ==> child_pid
ptid ==> parent_tid
newtls ==> tld
child_stack ==> stack
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted in kernel commit 821cc7b0b205c0df64cce59aacc330af251fa8f7,
threads create an ambiguity: what if the calling process's PGID
is changed by another thread while waitpid(0, ...) is blocked?
So, clarify that waitpid(0, ...) means wait for children whose
PGID matches the caller's PGID at the time of the call to
waitpid().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since Linux 5.4, idtype == P_PGID && id == 0 can be used to wait
on children in same process group as caller.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit a9a08845e9acbd224e4ee466f5c1275ed50054e8, the
equivalence between select() and poll()/epoll is defined in terms
of the EPOLL* constants, rather than the POLL* constants.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christian noted that SA_NOCLDWAIT also matters in this scenario.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After review comments from Christian and Daniel.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Thus, pidfd_open() is the preferred way of obtaining a PID
file descriptor.
Notes from a conversation with Christian Brauner:
[[
> A further question... We now have three ways of getting a
> process file descriptor [*]:
>
> open() of /proc/PID
> pidfd_open()
> clone()/clone3() with CLONE_PIDFD
>
> I thought the FD was supposed to be equivalent in all three cases.
> However, if I try (on kernel 5.3) poll() an FD returned by opening
> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference
> intentional? (I am guessing it is not.)
It's intentional.
The short answer is that /proc/<pid> is a convenience for sending
signals.
The longer answer is that this stems from a heavy debate about what a
process file descriptor was supposed to be and some people pushing for
at least being able to use /proc/<pid> dirfds while ignoring security
problems as soon as you're talking about returning those fds from
clone(); not to mention the additional problems discovered when trying
to implementing this.
A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features
such as exit notification, read, and other future extensions will only
be implemented on top of them.
As much as we'd have liked to get rid of two different file descriptor
types it doesn't hurt us much and is not that much different from what
we will e.g. see with fsinfo() in the new mount api which needs to work
on regular fds gotten via open()/openat() and mountfds gotten from
fsopen() and fspick(). The mountfds will also allow for advanced
operations that the other ones will not. There's even an argument to be
made that fds you will get from open()/openat() and openat2() are
different types since they have very different behavior; openat2()
returning fds that are non arbitrarily upgradable etc.
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Notes from a conversation on linux-man@ with Christian Brauner:
[[
> [*} By the way, going forward, can we call these things
> "process FDs", rather than "PID FDs"? The API names are what
> they are, an that's okay, but these just as we have socket
> FDs that refer to sockets, directory FDs that refer to
> directories, and timer FDs that refer to timers, and so on,
> these are FDs that refer to *processes*, not "process IDs".
> It's a little thing, but I think the naming better, and
> it's what I propose to use in the manual pages.
The naming was another debate and we ended with this compromise.
I would just clarify that a pidfd is a process file descriptor. I
wouldn't make too much of a deal of hiding the shortcut "pidfd".
People are already using it out there in the wild and it's never
proven a good idea to go against accepted practice.
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the kernel source (kernel/fork.c::copy_process()), there is:
pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
O_RDWR | O_CLOEXEC);
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add an entry for CLONE_PIDFD. This flag is available starting
with kernel 5.2. If specified, a process file descriptor
("pidfd") referring to the child process will be returned in
the ptid argument.
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After my rewriting, almost nothing of the original page remains,
so update the copyright. As the author, I'm relicensing to the
"verbatim" license most commonly used in man pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text stating that "pivot_root() may or may not change the
current root and the current working directory of any processes
or threads which use the old root directory" was written 19 years
ago, before the system call itself was even finalized in the
kernel. The implementation has never changed, and it won't
change in the future, since that would cause user-space breakage.
The existence of that text in DESCRIPTION, followed by qualifying
text stating what the implementation actually does (and has always
done) makes for confusing reading. Therefore, relegate this text
to a historical note in NOTES (so that readers with long memories
can see why the manual page was changed) and rework the text in
DESCRIPTION accordingly.
Reported-by: Philipp Wendler <ml@philippwendler.de>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Eric:
If we are going to be pedantic "filesystem" is really the
wrong concept here. The section about bind mount clarifies
it, but I wonder if there is a better term.
I think I would say: "new_root and put_old must not be on
the same mount as the current root."
I think using "mount" instead of "filesystem" keeps the
concepts less confusing.
As I am reading through this email and seeing text that is
trying to be precise and clear then hitting the term
"filesystem" is a bit jarring. pivot_root doesn't care a
thing for file systems. pivot_root only cares about mounts.
And by a "mount" I mean the thing that you get when you
create a bind mount or you call mount normally.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Philipp Wendler noted that the text on the restrictions for
'new_root' was slightly contradictory, and things could be
clarified and simplified by describing the restrictions on
'new_root' in one place.
Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the text that suggests that pivot_root() changes the root
directory and CWD of process that have directory and CWD on the
old root *filesystem*. Change "filesystem" to "directory".
Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the page more compact by removing the stub subsections that
list the manual pages for the namespace types. And while we're
here, add an explanation of the table columns.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman:
I hate to nitpick, but I am going to say that when I read
the text above the phrase "mount namespace of the process
that created the new mount namespace" feels wrong.
Either you use unshare(2) and the mount namespace of the
process that created the mount namespace changes.
Or you use clone(2) and you could argue it is the new child
that created the mount namespace.
Having a different mount namespace at the end of the
creation operation feels like it makes your phrase confusing
about what the starting mount namespace is. I hate to use
references that are ambiguous when things are changing.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Provide a more detailed explanation of the initialization of
the mount point list in a new mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reid noted a confusion between 'old_root' (my attempt at a
shorthand for the old root point) and 'put_old. Eliminate the
confusion by replacing the shorthand with "old root mount point".
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text talks about "parent mount namespaces", but there
is no such concept. As confirmed by Eric Biederman, what is mean
here is "the mount namespace this mount namespace started as a
copy of". So, this change writes up Eric's description in a more
detailed way.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman notes that the change in commit f646ac88ef was
not strictly necessary for this example, since one of the already
documented requirements is that various mount points must not have
shared propagation, or else pivot_root() will fail. So, simplify
the example.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After creating a new mount namespace, it may be desirable to
disable mount propagation. Give the reader a more explicit
hint about this.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In a recent conversation with Mathieu Desnoyers I was reminded
that we haven't written up anything about how deferred
cancellation and asynchronous signal handlers interact. Mathieu
ran into some of this behaviour and I promised to improve the
documentation in this area to point out the potential pitfall.
Thoughts?
8< --- 8< --- 8<
In pthread_setcancelstate.3, pthreads.7, and signal-safety.7 we
describe that if you have an asynchronous signal nesting over a
deferred cancellation region that any cancellation point in the
signal handler may trigger a cancellation that will behave
as-if it was an asynchronous cancellation. This asynchronous
cancellation may have unexpected effects on the consistency of
the application. Therefore care should be taken with asynchronous
signals and deferred cancellation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update with all the missing errors the syscall can return, the
behaviour the syscall should have w.r.t. to copies within single
files, etc.
[Amir] updates for final released version.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In RETURN VALUE, point reader at subsection noting that the return
value of the raw sched_setaffinity() system call differs from the
wrapper function in glibc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add entries for the new cache geometry values of the auxiliary
vector that got included in the kernel.
Signed-off-by: Raphael Moreira Zinsly <rzinsly@linux.vnet.ibm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using signalfd(2) with epoll(7) and fork(2) can lead to some head
scratching.
It seems that when a signalfd file descriptor is added to epoll
you will only get notifications for signals sent to the process
that added the file descriptor to epoll.
So if you have a signalfd fd registered with epoll and then call
fork(2), perhaps by way of daemon(3) for example. Then you will
find that you no longer get notifications for signals sent to the
newly forked process.
User kentonv on ycombinator[0] explained it thus
"One place where the inconsistency gets weird is when you
use signalfd with epoll. The epoll will flag events on the
signalfd based on the process where the signalfd was
registered with epoll, not the process where the epoll is
being used. One case where this can be surprising is if you
set up a signalfd and an epoll and then fork() for the
purpose of daemonizing -- now you will find that your epoll
mysteriously doesn't deliver any events for the signalfd
despite the signalfd otherwise appearing to function as
expected."
And another post from the same person[1].
And then there is this snippet from this kernel commit message[2]
"If you share epoll fd which contains our sigfd with another
process you should blame yourself. signalfd is "really
special"."
So add a note to the man page that points this out where people
will hopefully find it sooner rather than later!
[0]: https://news.ycombinator.com/item?id=9564975
[1]: https://stackoverflow.com/questions/26701159/sending-signalfd-to-another-process/29751604#29751604
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d80e731ecab420ddcb79ee9d0ac427acbc187b4b
Signed-off-by: Andrew Clayton <andrew@digital-domain.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman noted that my list of directories that could not
have shared propagation was incorrect. I had written that
new_root could not be shared; rather it should be: the parent of
the current root mount point.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Eric Biederman:
The concern from our conversation at the container
mini-summit was that there is a pathology if in your initial
mount namespace all of the mounts are marked MS_SHARED like
systemd does (and is almost necessary if you are going to
use mount propagation), that if new_root itself is MS_SHARED
then unmounting the old_root could propagate.
So I believe the desired sequence is:
>>> chdir(new_root);
+++ mount("", ".", MS_SLAVE | MS_REC, NULL);
>>> pivot_root(".", ".");
>>> umount2(".", MNT_DETACH);
The change to new new_root could be either MS_SLAVE or
MS_PRIVATE. So long as it is not MS_SHARED the mount won't
propagate back to the parent mount namespace.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
LXC uses this [1]. I tested, to double-check, and it works.
The fchdir() dance done by LXC is not needed though:
fchdir(old_root); umount(".", MNT_DETACH); fchdir(new_root);
As far as I can see, just the umount() is sufficient, since,
after pivot_root(), oldi_root is at the top of the stack
of mounts at "/" and thus (so long as CWD is at "/")
the umount will remove the mount at the top of the stack.
Eric Biederman confirmed my understanding by mail, and
Philipp Wendler verified my results by experiment.
[1] See the following commit in LXC:
commit 2d489f9e87fa0cccd8a1762680a43eeff2fe1b6e
Author: Serge Hallyn <serge.hallyn@ubuntu.com>
Date: Sat Sep 20 03:15:44 2014 +0000
pivot_root: switch to a new mechanism (v2)
Helped-by: Eric W. Biederman <ebiederm@xmission.com>
Helped-by: Philipp Wendler <ml@philippwendler.de>
Helped-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After around 19 years, the behavior of pivot_root() has not been
changed, and will almost certainly not change in the future.
So, reword to remove the suggestion that the behavior may change.
Also, more clearly document the effect of pivot_root() on
the calling process's current working directory.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The reference of "Note that this also applies" was vague. So
combine this paragraph with an earlier one to make the linkage
clearer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The idea that there might one day be a mechanism for kernel
threads to explicitly relinquish access to the filesystem never
came to pass (after 20 years), and the presence of text
describing this idea is, IMO, a distraction. So, remove it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One kernel printk() later, my suspicions seem confirmed: the text
describing the situation where the current root is not a mount
point (because of a chroot()) seems to be bogus. (Perhaps it was
true once upon a time.) In my testing, if the current root is not
a mount point, an EINVAL error results.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this text:
If the current root is not a mount point (e.g., after an
earlier chroot(2) or pivot_root())...
mention of pivot_root() makes no sense, since (as noted in an
earlier commit message for this page) 'new_root' in a previous
pivot_root() must (since Linux 2.4.5) have been a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One of these "bugs" is a philosophical point already covered
elsewhere in the page, while the other is a somewhat obscure joke.
Both pieces are a bit of a distraction, really.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The note that EBUSY is given if a filesystem is already mounted
on 'Iput_old' was never really true. That restriction was in
Linux 2.3.14, but removed in Linux 2.3.99-pre6 so it never made
it to mainline.
The relevant diff in pivot_root() was:
error = -EBUSY;
- if (d_new_root->d_sb == root->d_sb || d_put_old->d_sb == root->d_sb)
+ if (new_nd.mnt == root_mnt || old_nd.mnt == root_mnt)
goto out2; /* loop */
- if (d_put_old != d_put_old->d_covers)
- goto out2; /* mount point is busy */
error = -EINVAL;
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the text was written long ago, and hinted that things
might change in the future. However, 20 years have passed
and these details have not changed, so rework the text to
hint at that fact.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see from the source code, the statement that
"No other filesystem may be mounted on 'put_old'" is incorrect.
Even looking at the 2.4.0 source code, there I can't see such
a restriction. In addition, some testing on a 5.0 kernel
(mounting 'put_old' in the new mount namespace just before
pivot_root()) did not result in an error for this case when
calling pivot_root().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pivot_root() only affects the current working directory and root
directory of other processes in the same mount namespace as the
caller.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The restriction on what values may be specified in 'si_code'
apply only when sending a signal to a process other than the
caller itself.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Threads are allowed to switch mount namespaces if the filesystem
details aren't being shared. That's the purpose of the check in
the kernel quoted by the comment:
if (fs->users != 1)
return -EINVAL;
It's been this way since the code was originally merged in v3.8.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since introduction of MAP_SHARED_VALIDATE, in case flags contain
both MAP_PRIVATE and MAP_SHARED, mmap() doesn't fail with EINVAL,
it succeeds.
The reason for that is that MAP_SHARED_VALIDATE is in fact equal
to MAP_PRIVATE | MAP_SHARED.
This is intended behavior, see:
https://lwn.net/Articles/758594/https://lwn.net/Articles/758598/
Signed-off-by: Nikola Forró <nforro@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Things changed in Linux v5.3-rc3 commit 315c69261dd3 from
splitting after template expansion to splitting beforehand.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This requirement on the first digit with the %e format comes from
the ISO C standard. It ensures that all the digits in the output are
significant and forbids output with a precision less than requested.
Signed-off-by: Vincent Lefevre <vincent@vinc17.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Even though the RFW_* flags were first introduced in Linux 4.6,
they could not be used with aio until 4.13 where the aio_rw_flags
field was added to struct iocb (9830f4be159b "fs: Use RWF_* flags
for AIO operations"). Correct the stated version for each flag.
Fixes: 2f72816f86 ("io_submit.2: Add kernel version numbers for various 'aio_rw_flags' flags")
Signed-off-by: Matti Möll <Matti.Moell@opensynergy.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
E2BIG was removed in 2.6.29, we should mark it as deprecated.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add powerpc64 to the calling convention tables.
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PTRACE_GET_SYSCALL_INFO request was introduced by Linux kernel
commit 201766a20e30f982ccfe36bebfad9602c3ff574a aka
v5.3-rc1~65^2~23.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Florin:
In the first table, for the riscv Arch/ABI, the instruction
should be ecall instead of scall.
According the official manual, the instruction has been
renamed.
https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
"The SCALL and SBREAK instructions have been renamed to
ECALL and EBREAK, respectively. Their encoding and
functionality are unchanged."
Reported-by: Florin Blanaru <florin.blanaru96@gmail.com>
Reviewed-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Matt Perricone <matt.perricone@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Matt Perricone <matt.perricone@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Matt Perricone <matt.perricone@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- correct smarpqi to smartpqi
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Describe few recently added options (present in glibc-2.29).
Sort the options a bit more logically and alphabetically.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Simone:
I was looking at version from 2017-09-15 but it's the same
on: http://man7.org/linux/man-pages/man2/statx.2.html
(2019-03-06)
There is reported (about the mask argument) after the list
of constants:
> Note that the kernel does not reject values in mask other
> than the above. Instead, it simply informs the caller which
> values are sup‐ ported by this kernel and filesystem via the
> statx.stx_mask field.
But as reported in the error values, there can be EINVAL if
mask has a reserved valued, and I found a check against
STATX__RESERVED in fs/stat.c for this. So if you use a that
bit (0x80000000U) the kernel will reject the value.
Probably is better to say that the kernel do not enforce the
use of only the listed values, but there are anyway reserved
values so and so you cannot put whatever you want on mask
(that apply to more values than UINT_MAX).
Reported-by: Simone Piccardi <piccardi@truelite.it>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>