The eventfd.2 page has some details on the eventfd2() system call,
which was new in Linux 2.6.27.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 2.6.27 added eventfd(), which supports a flags argument
that eventfd() did not provide. The flags so far implemented
are EFD_NONBLOCK and EFD_CLOEXEC,
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current wording suggests that only a single fcntl()
operation is needed to set the FD_CLOEXEC flag, when "proper"
usage would be fcntl(F_GETFD) + fcntl(F_SETFD) to get the
flags and then update them. So change the wording to indicate
that more than one fcntl() operation is required.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Supply a little more explanation about why the 'size' argument
of epoll_create() is nowadays ignored.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc doesn't (and quite probably won't) include a wrapper for this
system call. Therefore, point out that potential callers will need
to use syscall(2), and rewrite the RETURN VALUE text to show things
as they would be if syscall() is used.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a para to start of page that points out that this is the
low-level, Linux-specific API, and point the reader to posix_fallocate(3)
for the portable API.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Describe per-process namespaces, including discussion
of clone() and unshare CLONE_NEWNS, and /proc/PID/mounts.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The length of this page means that it's becoming difficult to parse
which info is specific to mount() versus umount()/umount2(), so split
the umount material out into its own page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Refer the reader to new text in execve(2) that describes how
(since Linux 2.6.23) RLIMIT_STACK determines the value of ARG_MAX.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX.1-2001 says that the values returned by sysconf()
are constant for the life of the process.
But the fact that, since Linux 2.6.23, ARG_MAX is settable
via RLIMIT_STACK means _SC_ARG_MAX is no longer constant,
since it can change at each execve().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Starting with Linux 2.6.23, the ARG_MAX limit became settable via
(1/4 of) RLIMIT_STACK. This broke ABI compatibility if RLIMIT_STACK
was set such that ARG_MAX was < 32 pages. Document the fact that
since 2.6.25 Linux imposes a floor on ARG_MAX, so that the old limit
of 32 pages is guaranteed.
For some background on the changes to ARG_MAX in kernels 2.6.23 and
2.6.25, see:
http://sourceware.org/bugzilla/show_bug.cgi?id=5786http://bugzilla.kernel.org/show_bug.cgi?id=10095http://thread.gmane.org/gmane.linux.kernel/646709/focus=648101,
checked into 2.6.25 as commit a64e715fc74b1a7dcc5944f848acc38b2c4d4ee2.
Also some reordering/rewording of the discussion of ARG_MAX.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The old sentence sat on its own in an odd place, and anyway the
modern BSDs use the name RLIMIT_NOFILE.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A sentence clarifying that pending signal set is union of
per-thread and process-wide pending signal sets.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page was previously fuzzy about whether the these interfaces
have process-wide or per-thread semantics. (E.g., now the
page states that the calling *thread* (not process) is suspended
until the signal is delivered.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These words are slightly bogus: although the interface is obsolete,
for ABI-compatibility reasons, the kernel folk should never be changing
this interface.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc doesn't provide any support for readdir(2),
so remove these header files (which otherwirse suggest
that glibc does provide the required pieces).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The location of the fields fater d_name varies according to
the size of d_name. We can't properly declare them in C;
therefore, put those fields inside a comment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure isn't currently defined in glibc headers, and the kernel
name of the structure is 'linux_dirent' (as was already used in some,
but not all, places in this page).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Maxin suggested a patch, which I've rewritten and expanded.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Maxin B. John <maxin.john@ap.sony.com>
As at kernel 2.6.27, only ext[234] support d_type.
On other file systems, d_type is always set to DT_UNKNOWN (0).
Reported-by: Ricardo Catalinas Jiménez <jimenezrick@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make it clear that the POSIX.1 revision that is likely
to affect the feature test macro requirements for futimens() is
POSIX.1-2008.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Nicolas François <nicolas.francois@centraliens.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The times argument point to *an array of* structures, and the
man-page should say that consistently.
(The '&' before sop in the semop() call is unneeded.)
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Summary from mtk: recent work on mlock caused Maxin to notice that
the EAGAIN error was not documented. KOSAKI Motohiro noted
that this behavior is longstanding.
=====
Dear Michael,
As per the mlock(2) implementation bugfix which is present in
Linux 2.6.27-rc2 git commit,
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a477097d9c37c1cf289c7f0257dffcfa42d50197),
the mlock(2) man page should be modified to reflect the latest changes
in the kernel.
See the LKML thread regarding this commit :
http://www.nabble.com/mlock()-return-value-issue-in-kernel-2.6.23.17-td18751601.html
This patch modifies the mlock(2) behaviour as per the SUSv3 specification.
[ENOMEM]
Some or all of the address range specified by the addr and
len arguments does not correspond to valid mapped pages
in the address space of the process.
[EAGAIN]
Some or all of the memory identified by the operation could not
be locked when the call was made.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Maxin B. John <maxin.john@ap.sony.com>
=====
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: "Maxin John" <maxin.john@gmail.com>
Subject: Re: mlock(2) man page modifications
Cc: kosaki.motohiro@jp.fujitsu.com,
"Michael Kerrisk" <mtk.manpages@googlemail.com>, man@vger.kernel.org
Date: Thu, 25 Sep 2008 15:04:49 +0900 (JST)
Hi Maxin,
Thank you for your attention.
I think your point and your patch are right.
However, my patch is trivial regression fix, not behavior change.
An older kernel can return EAGAIN at memory stavation.
my patch has following hunk.
> +++ b/mm/mlock.c
> @@ -78,8 +78,6 @@ success:
>
> mm->locked_vm -= pages;
> out:
> - if (ret == -ENOMEM)
> - ret = -EAGAIN;
In addition, 2.6.11 (oldest code of git repository) has following code.
static int mlock_fixup(struct vm_area_struct * vma,
unsigned long start, unsigned long end, unsigned int newflags)
{
(snip)
vma->vm_mm->locked_vm -= pages;
out:
if (ret == -ENOMEM)
ret = -EAGAIN;
return ret;
}
that behavior is linux mlock's behavior for long long time.
Thanks!
The error by getpid() in the presence of clone() occurs
only for a fork-like clone (one that omits CLONE_VM from the flags.)
This is a low-level detail, but there is no problem [known-to-me]
for thread-like clone().
getpid() caches the PID after the first call. This relies
on support in the glibc wrappers for fork()/vfork()/clone().
However, if syscall() is used to directly invoke fork()/vfork()/clone(),
the cache is not updated, and getpid() in the child procudes the wrong
result.
> > Linux, lstat(2) will generally not trigger automounter action, whereas
> > stat(2) will.
>
> I don't understand this last piece. Can you say some more. (I'm not
> familiar with automounter details.)
An automounter (either an explicit one, like autofs, or an implicit
one, such as are used by AFS or NFSv4) is something that triggers
a mount when something is touched.
However, it's undesirable to automount, say, everyone's home
directory just because someone opened up /home in their GUI
browser or typed "ls -l /home". The early automounters simply
didn't list the contents until you accessed it by name;
this is still the case when you can't enumerate a mapping
(say, all DNS names under /net). However, this is extremely
inconvenient, too.
The solution we ended up settling on is to create something
that looks like a directory (i.e. reports S_IFDIR in stat()),
but behaves somewhat like a symlink. In particular, when it is
accessed in a way where a symlink would be dereferenced,
the automount triggers and the directory is mounted. However,
system calls which do *not* cause a symlink to be dereferenced,
like lstat(), also do not cause the automounter to trigger.
This means that "ls -l", or a GUI file browser, can see a list
of directories without causing each one of them to be automounted.
-hpa