man-pages

Commit Graph

Author	SHA1	Message	Date
Michael Kerrisk	5039577811	tcp.7: SEE ALSO: mention Documentation/networking/ip-sysctl.txt Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:41:42 +02:00
Michael Kerrisk	ff5de6ecc4	timerfd_create.2: Refer reader to clock_getres(2) for further details on the clocks Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:24:47 +02:00
Michael Kerrisk	96d8887df7	timer_create.2: wfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:24:47 +02:00
Michael Kerrisk	65ff4e238d	clock_getres.2: Minor clarification in description of CLOCK_BOOTTIME Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:24:36 +02:00
Michael Kerrisk	dd6b076aa6	socket.7: Note SCM message types for SO_TIMESTAMP and SO_TIMESTAMPNS Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:07:04 +02:00
Michael Kerrisk	3e472692a6	socket.7: Add some SO_TIMESTAMPNS details. Note the kernel version that added SO_TIMESTAMPNS, and (from the kernel commit) note tha SO_TIMESTAMPNS and SO_TIMESTAMP are mutually exclusive. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:07:04 +02:00
Alejandro Colomar	a47d370bb3	socket.7: Document SO_TIMESTAMPNS =========== DESCRIPTION =========== I added a paragraph for ``SO_TIMESTAMP``, and modified the paragraph for ``SIOCGSTAMP`` in relation to ``SO_TIMESTAMPNS``. I based the documentation on the existing ``SO_TIMESTAMP`` documentation, and on my experience using ``SO_TIMESTAMPNS``. I asked a question on stackoverflow, which helped me understand ``SO_TIMESTAMPNS``: https://stackoverflow.com/q/60971556/6872717 Testing of the feature being documented ======================================= I wrote a simple server and client test. In the client side, I connected a socket specifying ``SOCK_STREAM`` and ``"tcp"``. Then I enabled timestamp in ns: .. code-block:: c int enable = 1; if (setsockopt(sd, SOL_SOCKET, SO_TIMESTAMPNS, &enable, sizeof(enable))) goto err; Then I prepared the msg header: .. code-block:: c char buf[BUFSIZ]; char cbuf[BUFSIZ]; struct msghdr msg; struct iovec iov; memset(buf, 0, ARRAY_BYTES(buf)); iov.iov_len = ARRAY_BYTES(buf) - 1; iov.iov_base = buf; msg.msg_name = NULL; msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cbuf; msg.msg_controllen = ARRAY_BYTES(cbuf); And got some times before and after receiving the msg: .. code-block:: c struct timespec tm_before, tm_recvmsg, tm_after, tm_msg; clock_gettime(CLOCK_REALTIME, &tm_before); usleep(500000); clock_gettime(CLOCK_REALTIME, &tm_recvmsg); n = recvmsg(sd, &msg, MSG_WAITALL); if (n < 0) goto err; usleep(1000000); clock_gettime(CLOCK_REALTIME, &tm_after); After that I read the timestamp of the msg: .. code-block:: c struct cmsghdr *cmsg; for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SO_TIMESTAMPNS) { memcpy(&tm_msg, CMSG_DATA(cmsg), sizeof(tm_msg)); break; } } if (!cmsg) goto err; And finally printed the results: .. code-block:: c double tdiff; printf("%s\n", buf); tdiff = timespec_diff_ms(&tm_before, &tm_recvmsg); printf("tm_r - tm_b = %lf ms\n", tdiff); tdiff = timespec_diff_ms(&tm_before, &tm_after); printf("tm_a - tm_b = %lf ms\n", tdiff); tdiff = timespec_diff_ms(&tm_before, &tm_msg); printf("tm_m - tm_b = %lf ms\n", tdiff); Which printed: :: asdasdfasdfasdfadfgdfghfthgujty 6, 0; tm_r - tm_b = 500.000000 ms tm_a - tm_b = 1500.000000 ms tm_m - tm_b = 18.000000 ms System: :: Linux debian 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64 GNU/Linux gcc (Debian 9.3.0-8) 9.3.0 Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:07:04 +02:00
Michael Kerrisk	f3c29937e6	prctl.2: Note semantics of IO_FLUSHER state with respect to fork(2) and execve(2) Reported-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:07:04 +02:00
Michael Kerrisk	22a2e0553b	lseek.2: ERRORS: ENXIO can also occur SEEK_DATA in middle of hole at end of file Quoting Matthew Wilcox: The current text of the lseek manpage is ambiguous about the behaviour of lseek(SEEK_DATA) for a file which is entirely a hole (or the end of the file is a hole and the pos lies within the hole). The draft POSIX language is specific (ENXIO is returned when whence is SEEK_DATA and offset lies within the final hole of the file). Could I trouble you to wordsmith that in? If you want to look at the draft POSIX text, it's here: https://www.austingroupbugs.net/view.php?id=415 Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 10:06:56 +02:00
Michael Kerrisk	ab366b4567	lseek.2: Minor fix to wording of ENXIO error Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-06 06:59:35 +02:00
Michael Kerrisk	bef940caef	clock_getres.2: Minor tweaks to example Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-04 10:52:16 +02:00
Michael Kerrisk	a04e44bd3a	clock_getres.2: Clarify that CLOCK_MONOTONIC is system-wide Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-04 09:49:03 +02:00
Michael Kerrisk	9d69bebbd6	clock_getres.2: Clarify that CLOCK_TAI is nonsettable Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-04 09:49:03 +02:00
Michael Kerrisk	16fa57813e	clock_getres.2: Add an example program Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-04 09:32:33 +02:00
Michael Kerrisk	a48d19162d	clock_getres.2: wfix: EOPNOTSUPP --> ENOTSUP The two error codes are synonymous, but ENOTSUP is what is used in other related pages. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 21:54:07 +02:00
Eric Rannaud	f873b37560	clock_getres.2: Dynamic POSIX clock devices can return other errors See Linux source as of v5.4: kernel/time/posix-clock.c Signed-off-by: Eric Rannaud <e@nanocritical.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 21:52:01 +02:00
Michael Kerrisk	0f1553b5fd	timerfd_create.2: Note a case where timterfd_settime() can fail with ECANCELED From email discussions with Thomas Gleixner: ====== Hello Thomas, et al, Following on from our discussion of read() on a timerfd [1], I happened to remember a Debian bug report [2] that points out that timer_settime() can fail with the error ECANCELED, which is both surprising and odd (because despite the error, the timer does get updated). The relevant kernel code (I think, from your commit [3]) seems to be the following in timerfd_setup(): if (texp != 0) { if (flags & TFD_TIMER_ABSTIME) texp = timens_ktime_to_host(clockid, texp); if (isalarm(ctx)) { if (flags & TFD_TIMER_ABSTIME) alarm_start(&ctx->t.alarm, texp); else alarm_start_relative(&ctx->t.alarm, texp); } else { hrtimer_start(&ctx->t.tmr, texp, htmode); } if (timerfd_canceled(ctx)) return -ECANCELED; } Using a small test program [4] shows the behavior. The program loops, repeatedly calling timerfd_settime() (with a delay of a few seconds before each call). In another terminal window, enter the following command a few times: $ sudo date -s "5 seconds" # Add 5 secs to wall-clock time I see behavior as follows (the /sudo date -s "5 seconds"/ command was executed before loop iterations 0, 2, and 4): [[ $ ./timerfd_settime_ECANCELED 0 Current time is 1585729978 secs, 868510078 nsecs Timer value is now 0 secs, 0 nsecs timerfd_settime() succeeded Timer value is now 9 secs, 999991977 nsecs 1 Current time is 1585729982 secs, 716339545 nsecs Timer value is now 6 secs, 152167990 nsecs timerfd_settime() succeeded Timer value is now 9 secs, 999992940 nsecs 2 Current time is 1585729991 secs, 567377831 nsecs Timer value is now 1 secs, 148959376 nsecs timerfd_settime: Operation canceled Timer value is now 9 secs, 999976294 nsecs 3 Current time is 1585729995 secs, 405385503 nsecs Timer value is now 6 secs, 161989917 nsecs timerfd_settime() succeeded Timer value is now 9 secs, 999993317 nsecs 4 Current time is 1585730004 secs, 225036165 nsecs Timer value is now 1 secs, 180346909 nsecs timerfd_settime: Operation canceled Timer value is now 9 secs, 999984345 nsecs ]] I note from the above. (1) If the wall-clock is changed before the first timerfd_settime() call, the call succeeds. This is of course expected. (2) If the wall-clock is changed after a timerfd_settime() call, then the next timerfd_settime() call fails with ECANCELED. (3) Even if the timerfd_settime() call fails, the timer is still updated(!). Some questions: (a) What is the rationale for timerfd_settime() failing with ECANCELED in this case? (Currently, the manual page says nothing about this.) (b) It seems at the least surprising, but more likely a bug, that timerfd_settime() fails with ECANCELED while at the same time successfully updating the timer value. Your thoughts? Thanks, Michael [1] https://lore.kernel.org/lkml/3cbd0919-c82a-cb21-c10f-0498433ba5d1@gmail.com/ [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947091 [3] commit 99ee5315dac6211e972fa3f23bcc9a0343ff58c4 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Apr 27 14:16:42 2011 +0200 timerfd: Allow timers to be cancelled when clock was set [4] /* timerfd_settime_ECANCELED.c / #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <inttypes.h> #include <sys/timerfd.h> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) int main(int argc, char argv[]) { struct itimerspec ts, gts; struct timespec start; int tfd = timerfd_create(CLOCK_REALTIME, 0); if (tfd == -1) errExit("timerfd_create"); ts.it_interval.tv_sec = 0; ts.it_interval.tv_nsec = 10; int flags = TFD_TIMER_ABSTIME \| TFD_TIMER_CANCEL_ON_SET; for (long j ; ; j++) { /* Inject a delay into each loop, by calling getppid() many times / for (int k = 0; k < 10000000; k++) getppid(); if (j % 1 == 0) printf("%ld\n", j); / Display the current wall-clock time / if (clock_gettime(CLOCK_REALTIME, &start) == -1) errExit("clock_gettime"); printf("Current time is %ld secs, %ld nsecs\n", start.tv_sec, start.tv_nsec); / Before resetting the timer, retrieve its current value so that after the timerfd_settime() call, we can see whether the the value has changed / if (timerfd_gettime(tfd, &gts) == -1) perror("timerfd_gettime"); printf("Timer value is now %ld secs, %ld nsecs\n", gts.it_value.tv_sec, gts.it_value.tv_nsec); / Reset the timer to now + 10 secs / ts.it_value.tv_sec = start.tv_sec + 10; ts.it_value.tv_nsec = start.tv_nsec; if (timerfd_settime(tfd, flags, &ts, NULL) == -1) perror("timerfd_settime"); else printf("timerfd_settime() succeeded\n"); / Display the timer value once again */ if (timerfd_gettime(tfd, &gts) == -1) perror("timerfd_gettime"); printf("Timer value is now %ld secs, %ld nsecs\n", gts.it_value.tv_sec, gts.it_value.tv_nsec); printf("\n"); } } ======= Subject: Re: timer_settime() and ECANCELED Date: Wed, 01 Apr 2020 19:42:42 +0200 From: Thomas Gleixner <tglx@linutronix.de> Michael, "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > Following on from our discussion of read() on a timerfd [1], I > happened to remember a Debian bug report [2] that points out that > timer_settime() can fail with the error ECANCELED, which is both > surprising and odd (because despite the error, the timer does get > updated). ... > (1) If the wall-clock is changed before the first timerfd_settime() > call, the call succeeds. This is of course expected. > (2) If the wall-clock is changed after a timerfd_settime() call, then > the next timerfd_settime() call fails with ECANCELED. > (3) Even if the timerfd_settime() call fails, the timer is still updated(!). > > Some questions: > (a) What is the rationale for timerfd_settime() failing with ECANCELED > in this case? (Currently, the manual page says nothing about this.) > (b) It seems at the least surprising, but more likely a bug, that > timerfd_settime() fails with ECANCELED while at the same time > successfully updating the timer value. Really good question and TBH I can't remember why this is implemented in the way it is, but I have a faint memory that at least (a) is intentional. After staring at the code for a while I came up with the following answers: (a): If the clock was set event ("date -s ...") which triggered the cancel was not yet consumed by user space via read(), then that information would get lost because arming the timer to the new value has to reset the state. (b): Arming the timer in that case is indeed very questionable, but it could be argued that because the clock was set event happened with the old expiry value that the new expiry value is not affected. I'd be happy to change that and not arm the timer in the case of a pending cancel, but I fear that some user space already depends on that behaviour. Thanks, tglx ====== Subject: Re: timer_settime() and ECANCELED Date: Thu, 02 Apr 2020 10:49:18 +0200 From: Thomas Gleixner <tglx@linutronix.de> To: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes: > On 4/1/20 7:42 PM, Thomas Gleixner wrote: >> (b): Arming the timer in that case is indeed very questionable, but it >> could be argued that because the clock was set event happened with >> the old expiry value that the new expiry value is not affected. >> >> I'd be happy to change that and not arm the timer in the case of a >> pending cancel, but I fear that some user space already depends on >> that behaviour. > > Yes, that's the risk, of course. So, shall we just document all > this in the manual page? I think so. Thanks, tglx ====== Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 21:43:21 +02:00
Michael Kerrisk	b5b0b2882e	prctl.2: Reword description of PR_GET_IO_FLUSHER Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 14:13:51 +02:00
Michael Kerrisk	3872a3d621	prctl.2: Unused args must be zero for PR_GET_IO_FLUSHER and PR_SET_IO_FLUSHER Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 14:08:39 +02:00
Michael Kerrisk	4222606d2a	prctl.2: f Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 14:07:12 +02:00
Michael Kerrisk	91e015066f	prctl.2: Minor tweaks to Mike Christie's patch Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 14:06:28 +02:00
Mike Christie	308eb2f636	prctl.2: Document PR_SETIO_FLUSHER/GET_IO_FLUSHER This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER prctl commands added to the linux kernel for 5.6 in commit: commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463 Author: Mike Christie <mchristi@redhat.com> Date: Mon Nov 11 18:19:00 2019 -0600 prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim Reviewed-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 13:59:05 +02:00
Michael Kerrisk	98511af299	pthread_getcpuclockid.3: Minor clarification to usage of 'clockid' argument Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	9e099e691e	clock_getcpuclockid.3, pthread_getcpuclockid.3: wfix: use 'clockid' rather than 'clock_id' For consistency across pages. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	ba1c6b2081	clock_getres.2: wfix: s/clk_id/clockid/ throughout Most other manual pages use 'clockid' for the 'clockid_t' argument. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	717096082d	clock_nanosleep.2: wfix: s/clock_id/clockid/ throughout Most other section 2 pages use 'clockid' as the name of the 'clockid_t' argument. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	96d951a401	clock_nanosleep.2, timer_create.2, timerfd_create.2: Add various missing errors Mostly verified by testing and reading the code. There is unfortunately quite a bit of inconsistency across API~s: clock_gettime clock_settime clock_nanosleep timer_create timerfd_create CLOCK_BOOTTIME y n (EINVAL) y y y CLOCK_BOOTTIME_ALARM y n (EINVAL) y [1] y [1] y [1] CLOCK_MONOTONIC y n (EINVAL) y y y CLOCK_MONOTONIC_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL) CLOCK_MONOTONIC_RAW y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL) CLOCK_REALTIME y y y y y CLOCK_REALTIME_ALARM y n (EINVAL) y [1] y [1] y [1] CLOCK_REALTIME_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL) CLOCK_TAI y n (EINVAL) y y n (EINVAL) CLOCK_PROCESS_CPUTIME_ID y n (EINVAL) y y n (EINVAL) CLOCK_THREAD_CPUTIME_ID y n (EINVAL) n (EINVAL [2]) y n (EINVAL) pthread_getcpuclockid() y n (EINVAL) y y n (EINVAL) [1] The caller must have CAP_WAKE_ALARM, or the error EPERM results. [2] This error is generated in the glibc wrapper. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	04e2e313fc	timerfd_create.2: Rework text for EINVAL for invalid clock ID The error description was crufty. There are more valid clock IDs these days. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	d53b0f4822	clock_nanosleep.2: srcfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	0e7984ff40	clock_nanosleep.2: clock_nanosleep() can also sleep against CLOCK_TAI Presumably since Linux 3.10, when CLOCK_TAI was added to the kernel. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	b24db7cb8a	clock_nanosleep.2: srcfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:25 +02:00
Michael Kerrisk	14df252bf8	clock_getres.2: CLOCK_REALTIME_COARSE is not settable In kernel/time/posix-timers.c, 'CLOCK_REALTIME_COARSE' has no 'timer_set' method. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:57:18 +02:00
Michael Kerrisk	41043c0bd6	clock_getres.2: wfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:42:54 +02:00
Michael Kerrisk	ac90b58942	clock_getres.2: tfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:42:54 +02:00
Michael Kerrisk	eb6567fb00	clock_getres.2: Add CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:42:54 +02:00
Michael Kerrisk	da8a95bca1	timer_create.2: timer_create(2) also supports CLOCK_TAI Presumably (and from a quick glance at the source code) since Linux 3.10. when CLOCK_TAI was introduced. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:42:54 +02:00
Michael Kerrisk	0e4b87c4fd	timer_create.2: Mention clock_getres(2) for further details on the various clocks Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 12:42:54 +02:00
Michael Kerrisk	966051ca74	clock_nanosleep.2: clock_nanosleep() also supports CLOCK_BOOTTIME Presumably (and from a quick glance at the source code) since Linux 2.6.39, when CLOCK_BOOTTIME was introduced. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-02 09:34:37 +02:00
Michael Kerrisk	2c16f1bc28	clock_getres.2: wfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-01 21:30:37 +02:00
Michael Kerrisk	066dcd09cb	timerfd_create.2: wfix See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947091 Reported-by: Marc Lehmann <debian-reportbug@plan9.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-01 10:04:32 +02:00
Michael Kerrisk	372b58573a	openat2.2: srcfix: remove a FIXME Aleksa Sarai is okay with my text changes. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-01 08:36:49 +02:00
Michael Kerrisk	08ba10a6d5	openat2.2: wfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-01 08:18:45 +02:00
Michael Kerrisk	b4e1568256	openat2.2: Improve text describing caveat for use of RESOLVE_NO_XDEV From email discussions with Aleksa Sarai: > .\" FIXME I find the "previously-functional systems" in the previous > .\" sentence a little odd (since openat2() ia new sysycall), so I would > .\" like to clarify a little... > .\" Are you referring to the scenario where someone might take an > .\" existing application that uses openat() and replaces the uses > .\" of openat() with openat2()? In which case, is it correct to > .\" understand that you mean that one should not just indiscriminately > .\" add the RESOLVE_NO_XDEV flag to all of the openat2() calls? > .\" If I'm not on the right track, could you point me in the right > .\" direction please. This is mostly meant as a warning to hopefully avoid applications because the developer didn't realise that system paths may contain symlinks or bind-mounts. For an application which has switched to openat2() and then uses RESOLVE_NO_SYMLINKS for a non-security reason, it's possible that on some distributions (or future versions of a distribution) that their application will stop working because a system path suddenly contains a symlink or is a bind-mount. This was a concern which was brought up on LWN some time ago. If you can think of a phrasing that makes this more clear, I'd appreciate it. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-04-01 08:16:40 +02:00
Michael Kerrisk	c85ebb3c94	openat2.2: Various tweaks to the dicussion of 'resolve' flags Some tweaks inspired by https://lwn.net/Articles/796868/ Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-31 09:50:48 +02:00
Michael Kerrisk	e31d5bfd36	openat2.2: wfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-31 09:11:20 +02:00
Michael Kerrisk	193f7fb272	openat2.2: Place 'resolve' flags in alphabetical order Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-31 09:10:05 +02:00
Michael Kerrisk	1ae24555ba	timerfd_create.2: Negetive changes to CLOCK_REALTIME may cause read() to return 0 Devi R K reported this issue, and went on to note: > We have written a program using real time clock and it has been raised to > the community. > > https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/ [...] Thanks for pointing me at that thread. In particular, the test program at https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m489d81abdfbb2699743e18c37657311f8d52a4cd [...] I think this patch does not really capture the details properly. The immediately preceding paragraph says: If the associated clock is either CLOCK_REALTIME or CLOCK_REALTIME_ALARM, the timer is absolute (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET was specified when calling timerfd_settime(), then read(2) fails with the error ECANCELED if the real-time clock undergoes a discontinuous change. (This allows the reading application to discover such discontinuous changes to the clock.) Following on from that, I think we should have a paragraph that says something like: If the associated clock is either CLOCK_REALTIME or CLOCK_REALTIME_ALARM, the timer is absolute (TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET was not specified when calling timerfd_settime(), then a discontinuous negative change to the clock (e.g., clock_settime(2)) may cause read(2) to unblock, but return a value of 0 (i.e., no bytes read), if the clock change occurs after the time expired, but before the read(2) on the timerfd file descriptor. This seems consistent with Thomas's observations in https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m49b78122b573a2749a05b720dc9fa036546db490 == Thomas Gleixner replied: Yes, that's correct. Accurate as always! This is pretty much in line with clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME) which has a similar problem vs. observability in user space. clock_nanosleep(2) mutters: "POSIX.1 specifies that after changing the value of the CLOCK_REALTIME clock via clock_settime(2), the new clock value shall be used to determine the time at which a thread blocked on an absolute clock_nanosleep() will wake up; if the new clock value falls past the end of the sleep interval, then the clock_nanosleep() call will return immediately." which can be interpreted as guarantee that clock_nanosleep() never returns prematurely, i.e. the assert() in the below code would indicate a kernel failure: ret = clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &expiry, NULL); if (!ret) { clock_gettime(CLOCK_REALTIME, &now); assert(now >= expiry); } But that assert can trigger when CLOCK_REALTIME was modified after the timer fired and the kernel decided to wake up the task and let it return to user space. clock_nanosleep(..., &expiry) arm_timer(expires); schedule(); -> timer interrupt now = ktime_get_real(); if (expires <= now) -------------------------------- After this point wakeup(); clock_settime(2) or adjtimex(2) which makes CLOCK_REALTIME jump back far enough will cause the above assert to trigger. ... return from syscall (retval == 0) There is no guarantee against clock_settime() coming after the wakeup. Even if we put another check into the return to user path then we won't catch a clock_settime() which comes right after that and before user space invokes clock_gettime(). POSIX spec Issue 7 (2018 edition) says: The suspension for the absolute clock_nanosleep() function (that is, with the TIMER_ABSTIME flag set) shall be in effect at least until the value of the corresponding clock reaches the absolute time specified by rqtp. And that's what the kernel implements for clock_nanosleep() and timerfd behaves exactly the same way. The wakeup of the waiter, i.e. task blocked in clock_nanosleep(2), read(2), poll(2), is not happening _before_ the absolute time specified is reached. If clock_settime() happens right before the expiry check, then it does the right thing, but any modification to the clock after the wakeup cannot be mitigated. At least not in a way which would make the assert() in the example code above a reliable indicator for a kernel fail. That's the reason why I rejected the attempt to mitigate that particular 0 tick issue in timerfd as it would just scratch a particular itch but still not provide any guarantee. So having the '0' return documented is the right way to go. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: devi R.K <devi.feb27@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-30 22:52:58 +02:00
Michael Kerrisk	1f4cf8e85e	openat2.2: srcfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-30 22:41:25 +02:00
Michael Kerrisk	6b6505af4d	path_resolution.7: srcfix: semantic newlines Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-30 22:36:13 +02:00
Aleksa Sarai	61d24bff30	path_resolution.7: Update to mention openat2(2) features Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2020-03-30 22:35:33 +02:00

1 2 3 4 5 ...

20686 Commits All Branches Search

20686 Commits

All Branches