In recent times, a number of other namespace flags have been
added to clone(2). As such, it is no longer clear to use
the generic term "namespace" to refer to the particular
namespace controlled by CLONE_NEWNS; instead, use the
term "mount-point namespace".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I submitted a patch to fix this. See the LKML thread
"[patch] Fix type errors in inotify interfaces", 18 Nov 2008
If/when these patches are accepted, the pages need to be updated.
After Loic Domaigne's suggestion for pthread_setaffinity_np(3), add
similar text to this page noting that the system silently
limits the set of CPUs on which the process actually runs to
the set of CPUs physically present and the limits imposed by
cpuset(7).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Loic Domaigne <tech@domaigne.com>
Acked-by: Bert Wesarg <bert.wesarg@googlemail.com>
pthread_setaffinity_np() is preferable for setting
thread CPU affinity if using the POSIX threads API.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Many older pages use a handle_error() macro to do simple
error handling from system and library function calls.
Switch these pages to do similar.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
s/2.6.20/2.6.30/ to fix an earlier typo in the description
of the likely kernel version that will have fully fledged
real-time features.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
s/\.R " "/\\\&/ as a way of getting a blank line after a .SS heading.
(Suggested by Sam Varshavchik <mrsam@courier-mta.com>)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For sched_setaffinity(), the EINVAL error that occurs
if 'cpusetsize' is smaller than the kernel CPU set size only
occurs with kernels before 2.6.9.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
So on a direct syscall, the EINVAL could also occur for bufsiz < 0.
But at the moment, the error text is sufficiently vague
("bufsiz is not positive") that a change to the man page text
is probably not needed.
The page was phrased in a few places to describe the child as
holding the parent's memory until the child does an execve(2)
or an _exit(2). The latter case should really be the more
general process termimation (i.e., either _exit(2) or abnormal
termination).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Valdis.Kletnieks@vt.edu
Some file systems provide partial support for 'dt_type',
returning DT_UNKNOWN for cases they don't support.
Update the discussion of 'd_type' and DT_UNKNOWN to
support this.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The man page was not explicit about how the memory used by
the child is released back to the parent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Halesh S <halesh.s@india.com>
In some cases, EINVAL can occur if 'optval' is invalid.
Note this, and point reader to an example in ip(7).
In response to:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=216092
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Christian Grigis <glove@earthling.net>
Add ".SS Program source" to clearly distinguish shell session and
descriptive text from actual program code.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Strategic calls to sched_yield() can be used to improve
performance, but unnecessary use should be avoided.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text formerly described the operation of sched_yield() in
terms of processes. It should be in terms of threads.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page didn't previously clearly explain the scope of the
signal mask that is affected by sa_mask.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The first sentence of the page was vague on the scope of the
attribute changed by sigprocmask(). Reword to make this
clearer and add a sentence in NOTES to explicitly state that
the signal mask is a per-thread attribute.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Refer the reader to socket(2) for a description of the SOCK_CLOEXEC
and SOCK_NONBLOCK flags, which are supported by socketpair() since
Linux 2.6.27.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove sentence saying that glibc adds a flags argument to the syscall;
that was only relevant for the older eventfd() system call.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove sentence saying that glibc adds a flags argument to the syscall;
that was only relevant for the older signalfd() system call.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 2.6.27 added signalfd4(), which supports a flags argument
that signalfd() did not provide. The flags so far implemented
are SFD_NONBLOCK and SFD_CLOEXEC.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The eventfd.2 page has some details on the eventfd2() system call,
which was new in Linux 2.6.27.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 2.6.27 added eventfd(), which supports a flags argument
that eventfd() did not provide. The flags so far implemented
are EFD_NONBLOCK and EFD_CLOEXEC,
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current wording suggests that only a single fcntl()
operation is needed to set the FD_CLOEXEC flag, when "proper"
usage would be fcntl(F_GETFD) + fcntl(F_SETFD) to get the
flags and then update them. So change the wording to indicate
that more than one fcntl() operation is required.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Supply a little more explanation about why the 'size' argument
of epoll_create() is nowadays ignored.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc doesn't (and quite probably won't) include a wrapper for this
system call. Therefore, point out that potential callers will need
to use syscall(2), and rewrite the RETURN VALUE text to show things
as they would be if syscall() is used.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a para to start of page that points out that this is the
low-level, Linux-specific API, and point the reader to posix_fallocate(3)
for the portable API.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Describe per-process namespaces, including discussion
of clone() and unshare CLONE_NEWNS, and /proc/PID/mounts.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The length of this page means that it's becoming difficult to parse
which info is specific to mount() versus umount()/umount2(), so split
the umount material out into its own page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Refer the reader to new text in execve(2) that describes how
(since Linux 2.6.23) RLIMIT_STACK determines the value of ARG_MAX.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX.1-2001 says that the values returned by sysconf()
are constant for the life of the process.
But the fact that, since Linux 2.6.23, ARG_MAX is settable
via RLIMIT_STACK means _SC_ARG_MAX is no longer constant,
since it can change at each execve().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Starting with Linux 2.6.23, the ARG_MAX limit became settable via
(1/4 of) RLIMIT_STACK. This broke ABI compatibility if RLIMIT_STACK
was set such that ARG_MAX was < 32 pages. Document the fact that
since 2.6.25 Linux imposes a floor on ARG_MAX, so that the old limit
of 32 pages is guaranteed.
For some background on the changes to ARG_MAX in kernels 2.6.23 and
2.6.25, see:
http://sourceware.org/bugzilla/show_bug.cgi?id=5786http://bugzilla.kernel.org/show_bug.cgi?id=10095http://thread.gmane.org/gmane.linux.kernel/646709/focus=648101,
checked into 2.6.25 as commit a64e715fc74b1a7dcc5944f848acc38b2c4d4ee2.
Also some reordering/rewording of the discussion of ARG_MAX.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The old sentence sat on its own in an odd place, and anyway the
modern BSDs use the name RLIMIT_NOFILE.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A sentence clarifying that pending signal set is union of
per-thread and process-wide pending signal sets.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page was previously fuzzy about whether the these interfaces
have process-wide or per-thread semantics. (E.g., now the
page states that the calling *thread* (not process) is suspended
until the signal is delivered.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These words are slightly bogus: although the interface is obsolete,
for ABI-compatibility reasons, the kernel folk should never be changing
this interface.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc doesn't provide any support for readdir(2),
so remove these header files (which otherwirse suggest
that glibc does provide the required pieces).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The location of the fields fater d_name varies according to
the size of d_name. We can't properly declare them in C;
therefore, put those fields inside a comment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure isn't currently defined in glibc headers, and the kernel
name of the structure is 'linux_dirent' (as was already used in some,
but not all, places in this page).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Maxin suggested a patch, which I've rewritten and expanded.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Maxin B. John <maxin.john@ap.sony.com>
As at kernel 2.6.27, only ext[234] support d_type.
On other file systems, d_type is always set to DT_UNKNOWN (0).
Reported-by: Ricardo Catalinas Jimnez <jimenezrick@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make it clear that the POSIX.1 revision that is likely
to affect the feature test macro requirements for futimens() is
POSIX.1-2008.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Nicolas Franois <nicolas.francois@centraliens.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The times argument point to *an array of* structures, and the
man-page should say that consistently.
(The '&' before sop in the semop() call is unneeded.)
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Summary from mtk: recent work on mlock caused Maxin to notice that
the EAGAIN error was not documented. KOSAKI Motohiro noted
that this behavior is longstanding.
=====
Dear Michael,
As per the mlock(2) implementation bugfix which is present in
Linux 2.6.27-rc2 git commit,
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a477097d9c37c1cf289c7f0257dffcfa42d50197),
the mlock(2) man page should be modified to reflect the latest changes
in the kernel.
See the LKML thread regarding this commit :
http://www.nabble.com/mlock()-return-value-issue-in-kernel-2.6.23.17-td18751601.html
This patch modifies the mlock(2) behaviour as per the SUSv3 specification.
[ENOMEM]
Some or all of the address range specified by the addr and
len arguments does not correspond to valid mapped pages
in the address space of the process.
[EAGAIN]
Some or all of the memory identified by the operation could not
be locked when the call was made.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Maxin B. John <maxin.john@ap.sony.com>
=====
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: "Maxin John" <maxin.john@gmail.com>
Subject: Re: mlock(2) man page modifications
Cc: kosaki.motohiro@jp.fujitsu.com,
"Michael Kerrisk" <mtk.manpages@googlemail.com>, man@vger.kernel.org
Date: Thu, 25 Sep 2008 15:04:49 +0900 (JST)
Hi Maxin,
Thank you for your attention.
I think your point and your patch are right.
However, my patch is trivial regression fix, not behavior change.
An older kernel can return EAGAIN at memory stavation.
my patch has following hunk.
> +++ b/mm/mlock.c
> @@ -78,8 +78,6 @@ success:
>
> mm->locked_vm -= pages;
> out:
> - if (ret == -ENOMEM)
> - ret = -EAGAIN;
In addition, 2.6.11 (oldest code of git repository) has following code.
static int mlock_fixup(struct vm_area_struct * vma,
unsigned long start, unsigned long end, unsigned int newflags)
{
(snip)
vma->vm_mm->locked_vm -= pages;
out:
if (ret == -ENOMEM)
ret = -EAGAIN;
return ret;
}
that behavior is linux mlock's behavior for long long time.
Thanks!
The error by getpid() in the presence of clone() occurs
only for a fork-like clone (one that omits CLONE_VM from the flags.)
This is a low-level detail, but there is no problem [known-to-me]
for thread-like clone().
getpid() caches the PID after the first call. This relies
on support in the glibc wrappers for fork()/vfork()/clone().
However, if syscall() is used to directly invoke fork()/vfork()/clone(),
the cache is not updated, and getpid() in the child procudes the wrong
result.
> > Linux, lstat(2) will generally not trigger automounter action, whereas
> > stat(2) will.
>
> I don't understand this last piece. Can you say some more. (I'm not
> familiar with automounter details.)
An automounter (either an explicit one, like autofs, or an implicit
one, such as are used by AFS or NFSv4) is something that triggers
a mount when something is touched.
However, it's undesirable to automount, say, everyone's home
directory just because someone opened up /home in their GUI
browser or typed "ls -l /home". The early automounters simply
didn't list the contents until you accessed it by name;
this is still the case when you can't enumerate a mapping
(say, all DNS names under /net). However, this is extremely
inconvenient, too.
The solution we ended up settling on is to create something
that looks like a directory (i.e. reports S_IFDIR in stat()),
but behaves somewhat like a symlink. In particular, when it is
accessed in a way where a symlink would be dereferenced,
the automount triggers and the directory is mounted. However,
system calls which do *not* cause a symlink to be dereferenced,
like lstat(), also do not cause the automounter to trigger.
This means that "ls -l", or a GUI file browser, can see a list
of directories without causing each one of them to be automounted.
-hpa
links in 'oldpath'; see also http://lwn.net/Articles/294667.
POSIX.1-2008 makes it implementation-dependent whether or not
'oldpath' is dereferenced if it is a symbolic link.
Another attempt to rationalize description of MPOL_DEFAULT.
Since ~2.6.25, the system default memory policy is "local allocation".
MPOL_DEFAULT itself is a request to remove any non-default policy and
"fall back" to the surrounding context. Try to say that without delving
into implementation details.
Update the get_mempolicy(2) man page to add in the description of
the MPOL_F_MEMS_ALLOWED flag, added in 2.6.23.
mtk
Document additional EINVAL error that occurs is MPOL_F_MEMS_ALLOWED
is specified with either MPOL_F_ADDR or MPOL_F_NODE.
Misc cleanup of get_mempolicy(2):
+ mention that any mode flags will be saved with mode.
I don't bother to document mode flags here because we
already have a pointer to set_mempolicy(2) for more info
on memory policy. mode flags are discussed there.
+ remove some old, obsolete [IMO] NOTES and 'roff comments.
PF_ constants have always had the same values; there never has
been a protocol family that had more than one address family,
and POSIX.1-2001 only specifies the AF_* constants.
PF_ constants have always had the same values; there never has
been a protocol family that had more than one address family,
and POSIX.1-2001 only specifies the AF_* constants.
nodes outside the task's cpuset, as long as one valid node remains.
Now that cpuset man page exists, we can refer to it. Remove
stale comment regarding lack thereof.
Fix up the error return for nodemask containing nodes disallowed by
the process' current cpuset. Disallowed nodes are now silently ignored,
as long as the nodemask contains at least one node that is on-line,
allowed by the process' cpuset and has memory.
Now that we have a cpuset man page, we can refer to cpusets directly
in the man page text.