Some other socket options that are applicable for TCP and UDP sockets
are documented in socket(7), so help the reader by pointing them at
that page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Thanks to a tip from Keith Packard:
https://keithp.com/blogs/fd-passing/
(Also verified by experiment.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When sending ancillary data, at least one byte of real data should
also be sent. This is strictly necessary for stream sockets
(verified by experiment). It is not required for datagram sockets
on Linux (verified by experiment), but portable applications
should do so.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the ancillary data buffer for receiving SCM_RIGHTS file
descriptors is too small, then the excess file descriptors are
automatically closed in the receiving process. Verified by
experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified by experiment and reading the source code (although
the SCM_RIGHTS case is not so clear to me in the source code).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the buffer supplied to recvmsg() to receive ancillary data is
too small, then the data is truncated and the MSG_CTRUNC flag is
set.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The file UID does not come into play when creating a v3
security.capability extended attribute.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, note that it may be difficult for an application
to know about the existence of duplicate file descriptors.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note a useful performance benefit of EPOLLET: ensuring that
only one of multiple waiters (in epoll_wait()) is woken
up when a file descriptor becomes ready.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The parisc gateway page currently only exports 3 functions:
The lws_entry for CAS operations (at 0xb0), the set_thread_pointer
function for usage in glibc (at 0xe0) and the Linux syscall entry
(at 0x100).
All other symbols in the manpage are internal labels and
shouldn't be used directly by userspace or glibc, so drop them
from the man page documentation.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note ENOTDIR error that occurs when requesting a watch on a
nondirectory with IN_ONLYDIR.
Reported-by: Paul Millar <paul.millar@desy.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add background details on ambient and bounding set when
discussing capability transformations during execve(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
capset(2) and capget(2) apply operate only on the permitted,
effective, and inheritable process capability sets.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When comparing two namespaces symlinks to see if they refer to
the same namespace, both the inode number and the device ID
should be compared. This point was already made clear in
ioctl_ns(2), but was missing from this page.
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There was some confused missing of concepts between the
two subsections, and some other details that needed fixing up.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Confirmed with Serge Hallyn that: "nsroot" means the UID 0
in the namespace as it would be mapped into the initial userns.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use more consistent layout for lists of functions, and
remove punctuation from the lists to make them less cluttered.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We define in detail the X/Open System Interfaces i.e. _XOPEN_UNIX
and all of the X/Open System Interfaces (XSI) Options Groups.
The XSI options groups include encryption, realtime, advanced
realtime, realtime threads, advanced realtime threads, tracing,
streams, and legacy interfaces.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted by Rusty Russell:
I was really surprised that sendmsg() returned EBADF on a valid fd;
turns out I was using sendmsg with SCM_RIGHTS to send a closed fd,
which gives EBADF (see test program below).
But this is only obliquely referenced in unix(7):
SCM_RIGHTS
Send or receive a set of open file descriptors
from another process. The data portion contains
an integer array of the file descriptors. The
passed file descriptors behave as though they have
been created with dup(2).
EBADF is not mentioned in the unix(7) ERRORS (it's mentioned in
dup(2)).
int fdpass_send(int sockout, int fd)
{
/* From the cmsg(3) manpage: */
struct msghdr msg = { 0 };
struct cmsghdr *cmsg;
struct iovec iov;
char c = 0;
union { /* Ancillary data buffer, wrapped in a union
in order to ensure it is suitably aligned */
char buf[CMSG_SPACE(sizeof(fd))];
struct cmsghdr align;
} u;
msg.msg_control = u.buf;
msg.msg_controllen = sizeof(u.buf);
memset(&u, 0, sizeof(u));
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_flags = 0;
/* Keith Packard reports that 0-length sends don't work, so we
* always send 1 byte. */
iov.iov_base = &c;
iov.iov_len = 1;
return sendmsg(sockout, &msg, 0);
}
int fdpass_recv(int sockin)
{
/* From the cmsg(3) manpage: */
struct msghdr msg = { 0 };
struct cmsghdr *cmsg;
struct iovec iov;
int fd;
char c;
union { /* Ancillary data buffer, wrapped in a union
in order to ensure it is suitably aligned */
char buf[CMSG_SPACE(sizeof(fd))];
struct cmsghdr align;
} u;
msg.msg_control = u.buf;
msg.msg_controllen = sizeof(u.buf);
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_flags = 0;
iov.iov_base = &c;
iov.iov_len = 1;
if (recvmsg(sockin, &msg, 0) < 0)
return -1;
cmsg = CMSG_FIRSTHDR(&msg);
if (!cmsg
|| cmsg->cmsg_len != CMSG_LEN(sizeof(fd))
|| cmsg->cmsg_level != SOL_SOCKET
|| cmsg->cmsg_type != SCM_RIGHTS) {
errno = -EINVAL;
return -1;
}
memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
return fd;
}
static void child(int sockfd)
{
int newfd = fdpass_recv(sockfd);
assert(newfd < 0);
exit(0);
}
int main(void)
{
int sv[2];
int pid, ret;
assert(socketpair(AF_UNIX, SOCK_STREAM, 0, sv) == 0);
pid = fork();
if (pid == 0) {
close(sv[1]);
child(sv[0]);
}
close(sv[0]);
ret = fdpass_send(sv[1], sv[0]);
printf("fdpass of bad fd return %i (%s)\n", ret, strerror(errno));
return 0;
}
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The last argument is passed by value, not reference.
Reported-by: Tomi Salminen <tsalminen@forcepoint.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
gettimeofday() is declared obsolete by POSIX. Mention instead
the modern APIs for working with the realtime clock.
See https://bugzilla.kernel.org/show_bug.cgi?id=199049
Reported-by: Enrique Garcia <cquike@arcor.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
/proc/[pid]/ns/pid_for_children has a value only after first
child is created in PID namespace. Verified by experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>