The example is misleading. It is not a good idea to unlink an
existing socket because we might try to start the server multiple
times. In this case it is preferable to receive an error.
We could add code that removes the socket when the server process
is killed but that would stretch the example too far.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note the kernel version that added SO_TIMESTAMPNS,
and (from the kernel commit) note tha SO_TIMESTAMPNS and
SO_TIMESTAMP are mutually exclusive.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
===========
DESCRIPTION
===========
I added a paragraph for ``SO_TIMESTAMP``, and modified the
paragraph for ``SIOCGSTAMP`` in relation to ``SO_TIMESTAMPNS``.
I based the documentation on the existing ``SO_TIMESTAMP``
documentation, and
on my experience using ``SO_TIMESTAMPNS``.
I asked a question on stackoverflow, which helped me understand
``SO_TIMESTAMPNS``:
https://stackoverflow.com/q/60971556/6872717
Testing of the feature being documented
=======================================
I wrote a simple server and client test.
In the client side, I connected a socket specifying
``SOCK_STREAM`` and ``"tcp"``.
Then I enabled timestamp in ns:
.. code-block:: c
int enable = 1;
if (setsockopt(sd, SOL_SOCKET, SO_TIMESTAMPNS, &enable,
sizeof(enable)))
goto err;
Then I prepared the msg header:
.. code-block:: c
char buf[BUFSIZ];
char cbuf[BUFSIZ];
struct msghdr msg;
struct iovec iov;
memset(buf, 0, ARRAY_BYTES(buf));
iov.iov_len = ARRAY_BYTES(buf) - 1;
iov.iov_base = buf;
msg.msg_name = NULL;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = cbuf;
msg.msg_controllen = ARRAY_BYTES(cbuf);
And got some times before and after receiving the msg:
.. code-block:: c
struct timespec tm_before, tm_recvmsg, tm_after, tm_msg;
clock_gettime(CLOCK_REALTIME, &tm_before);
usleep(500000);
clock_gettime(CLOCK_REALTIME, &tm_recvmsg);
n = recvmsg(sd, &msg, MSG_WAITALL);
if (n < 0)
goto err;
usleep(1000000);
clock_gettime(CLOCK_REALTIME, &tm_after);
After that I read the timestamp of the msg:
.. code-block:: c
struct cmsghdr *cmsg;
for (cmsg = CMSG_FIRSTHDR(&msg); cmsg;
cmsg = CMSG_NXTHDR(&msg, cmsg)) {
if (cmsg->cmsg_level == SOL_SOCKET &&
cmsg->cmsg_type == SO_TIMESTAMPNS) {
memcpy(&tm_msg, CMSG_DATA(cmsg), sizeof(tm_msg));
break;
}
}
if (!cmsg)
goto err;
And finally printed the results:
.. code-block:: c
double tdiff;
printf("%s\n", buf);
tdiff = timespec_diff_ms(&tm_before, &tm_recvmsg);
printf("tm_r - tm_b = %lf ms\n", tdiff);
tdiff = timespec_diff_ms(&tm_before, &tm_after);
printf("tm_a - tm_b = %lf ms\n", tdiff);
tdiff = timespec_diff_ms(&tm_before, &tm_msg);
printf("tm_m - tm_b = %lf ms\n", tdiff);
Which printed:
::
asdasdfasdfasdfadfgdfghfthgujty 6, 0;
tm_r - tm_b = 500.000000 ms
tm_a - tm_b = 1500.000000 ms
tm_m - tm_b = 18.000000 ms
System:
::
Linux debian 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64
GNU/Linux
gcc (Debian 9.3.0-8) 9.3.0
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 5.6 added the new well-known VMADDR_CID_LOCAL for
local communication.
This patch explains how to use it and removes the legacy
VMADDR_CID_RESERVED no longer available.
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a '.RE' macro to terminate the last .RS block.
There is no change in the output.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In many cases, these don't improve readability, and (when stacked)
they sometimes have the side effect of sometimes forcing text
to be justified within a narrow column range.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PVS-Studio reports that in
char buf[8192];
/* ... */
nh = (struct nlmsghdr *) buf,
the pointer 'buf' is cast to a more strictly aligned pointer type.
This is undefined behaviour. One possible solution to make sure
that buf is correctly aligned is to declare buf as an array of
struct nlmsghdr. Other solutions include allocating the array on
the heap, use an union, or stdalign features. With this patch,
the buffer still contains 8192 bytes.
This was raised on Stack Overflow:
https://stackoverflow.com/questions/57745580/netlink-receive-buffer-alignment
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The definition of the tpacket_auxdata struct in the manpage is not
the same as the definition found in
/include/uapi/linux/if_packet.h.
In particular, instead of a tp_padding field, there is a
tp_vlan_tpid field. An example of a project using this field is
libpcap[1].
[1]: https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-linux.c#L349
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure 'struct sockaddr_vm' has additional element
'unsigned char svm_zero[]' since version v3.9-rc1
(include/uapi/linux/vm_sockets.h). Linux kernel checks that this
element is zeroed (net/vmw_vsock/vsock_addr.c). Reflect this on
the vsock man page.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=205583
Signed-off-by: Mikhail Golubev <Mikhail.Golubev@opensynergy.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the given example, the second recvmsg(2) call should receive four bytes,
as the third sendmsg(2) call only sends four.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the page more compact by removing the stub subsections that
list the manual pages for the namespace types. And while we're
here, add an explanation of the table columns.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman:
I hate to nitpick, but I am going to say that when I read
the text above the phrase "mount namespace of the process
that created the new mount namespace" feels wrong.
Either you use unshare(2) and the mount namespace of the
process that created the mount namespace changes.
Or you use clone(2) and you could argue it is the new child
that created the mount namespace.
Having a different mount namespace at the end of the
creation operation feels like it makes your phrase confusing
about what the starting mount namespace is. I hate to use
references that are ambiguous when things are changing.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Provide a more detailed explanation of the initialization of
the mount point list in a new mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text talks about "parent mount namespaces", but there
is no such concept. As confirmed by Eric Biederman, what is mean
here is "the mount namespace this mount namespace started as a
copy of". So, this change writes up Eric's description in a more
detailed way.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After creating a new mount namespace, it may be desirable to
disable mount propagation. Give the reader a more explicit
hint about this.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In a recent conversation with Mathieu Desnoyers I was reminded
that we haven't written up anything about how deferred
cancellation and asynchronous signal handlers interact. Mathieu
ran into some of this behaviour and I promised to improve the
documentation in this area to point out the potential pitfall.
Thoughts?
8< --- 8< --- 8<
In pthread_setcancelstate.3, pthreads.7, and signal-safety.7 we
describe that if you have an asynchronous signal nesting over a
deferred cancellation region that any cancellation point in the
signal handler may trigger a cancellation that will behave
as-if it was an asynchronous cancellation. This asynchronous
cancellation may have unexpected effects on the consistency of
the application. Therefore care should be taken with asynchronous
signals and deferred cancellation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If a mount point is deleted or renamed or removed in one mount
namespace, this will cause an object that is mounted at that
location in another mount namespace to be unmounted (as verified
by experiment). This was implied by the existing text, but it is
better to make this detail explicit.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See fs/xattr.c::xattr_permission()"
/*
* In the user.* namespace, only regular files and directories can have
* extended attributes. For sticky directories, only the owner and
* privileged users can write attributes.
*/
if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
(mask & MAY_WRITE) && !inode_owner_or_capable(inode))
return -EPERM;
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the file descriptors received in SCM_RIGHTS would cause
the process to its exceed RLIMIT_NOFILE limit, the excess
FDs are discarded.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Blackfin port was removed in Linux 4.17. Mention this in the
section concerning Blackfin vDSO functions.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improved the readability of a sentence that describes the use of
FAN_REPORT_FID and how this particular flag influences what data
structures a listening application could expect to receive when
describing an event.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the symbols exported by the RISCV vDSO which is present
from kernel 4.15 onwards.
See kernel source files in arch/riscv/kernel/vdso.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Details relating to the new initialization flag FAN_REPORT_FID has been
added. As part of the FAN_REPORT_FID feature, a new set of event masks are
available and have been documented accordingly.
A simple example program has been added to also support the understanding
and use of FAN_REPORT_FID and directory modification events.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Give the shell in the second cgroup namespace a different prompt,
so as to clearly distinguish the two namespaces.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>