Regarding man page documentation of the problem of short sleeps
for setiteimer(2)...
> > -- pointers to those threads
>
> http://bugzilla.kernel.org/show_bug.cgi?id=4569
> http://lkml.org/lkml/2005/4/29/163
>
> > -- indications of which kernel versions show this bahaviour
>
> AFAIK, all versions as far as x86 is concerned.
> Dunno if it is hardware specific.
>
> > -- a (short) test program to demonstrate it, if you have one.
>
> See the bugzilla bug's attachments
Sorry for the long delay in following this up, but I've got to
it now. I tweaked your suggestions slightly:
{{
Timers will never expire before the requested time,
-instead expiring some short, constant time afterwards, dependent
-on the system timer resolution (currently 10ms).
+but may expire some (short) time afterwards, which depends
+on the system timer resolution and on the system load.
+Upon expiration, a signal will be generated and the timer reset.
+If the timer expires while the process is active (always true for
+On certain systems (including x86), the Linux kernel has a bug which will
+produce premature timer expirations of up to one jiffy under some
+circumstances.
}}
Thanks for this bug reporet,
Nishanth: if and when your changes are accepted, and the problem
is thus fixed, could you please send me a notification of that
fact, and I can then further amend the manual pages.
Cheers,
Michael
/* itimer_short_interval_bug.c
June 2005
In current Linux kernels, an interval timer set using setitimer()
can sometimes sleep *less* than the specified interval.
This program demonstrates the behaviour by looping through all
itimer values from 1 microsecond upwards, in one microsecond steps.
*/
/* Adapted from a program by Olivier Croquette, June 2005 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
typedef unsigned long long int u_time_t; /* in microsecs */
static int handler_flag;
/* return time as a number of microsecs */
static u_time_t
gettime(void )
{
struct timeval tv;
if ( gettimeofday(&tv, NULL) == -1) {
perror("gettimeofday()");
return 0;
}
return (tv.tv_usec + tv.tv_sec * 1000000LL);
}
static void
handler (int sig, siginfo_t *siginfo, void *context)
{
handler_flag++;
return ;
}
/* Sleep for 'time' microsecs. */
static int
isleep(u_time_t time)
{
struct itimerval newtv;
sigset_t sigset;
struct sigaction sigact;
if (time == 0)
return 0;
/* block SIGALRM */
sigemptyset (&sigset);
sigaddset (&sigset, SIGALRM);
sigprocmask (SIG_BLOCK, &sigset, NULL);
/* set up our handler */
sigact.sa_sigaction = handler;
sigemptyset(&sigact.sa_mask);
sigact.sa_flags = SA_SIGINFO;
sigaction (SIGALRM, &sigact, NULL);
newtv.it_interval.tv_sec = 0;
newtv.it_interval.tv_usec = 0;
newtv.it_value.tv_sec = time / 1000000;
newtv.it_value.tv_usec = time % 1000000;
if (setitimer(ITIMER_REAL,&newtv,NULL) == -1) {
perror("setitimer(set)");
return 1;
}
sigemptyset (&sigset);
sigsuspend (&sigset);
return 0;
}
int
main(int argc, char *argv[]) {
u_time_t wait;
int loop, numLoops;
u_time_t t1, t2;
u_time_t actual;
long long minDiff, maxDiff, totDiff, diff;
int numFail = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s num-loops\n", argv[0]);
exit(EXIT_FAILURE);
} /* if */
numLoops = atoi(argv[1]);
setbuf(stdout, NULL);
for (wait = 1; ; wait++) {
maxDiff = 0;
numFail = 0;
totDiff = 0;
minDiff = -wait;
if (wait % 10000 == 0)
printf("%llu\n", wait);
for (loop = 0; loop < numLoops; loop++) {
t1 = gettime();
handler_flag = 0;
isleep(wait);
if ( handler_flag != 1 )
printf("Problem with the handler flag (%d)!\n", handler_flag);
t2 = gettime();
actual = t2 - t1;
if ( actual < wait ) {
diff = actual - wait;
if (diff < maxDiff)
maxDiff = diff;
if (diff > minDiff)
minDiff = diff;
totDiff += diff;
numFail++;
} /* if */
} /* for */
if (numFail > 0)
printf("%llu: %3d fail (%4lld %4lld; avg=%6.1f)\n",
wait, numFail, minDiff, maxDiff,
(double) totDiff / numFail);
} /* for */
return 0;
} /* main */
> The question came up whether execve of a suid binary while being ptraced
> would fail or ignore the suid part. The answer today seems to be the
> latter:
>
> E.g. (in 2.6.11) security/dummy.c:
>
> static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int
> unsafe)
> {
> if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
> if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) &&
> !capable(CAP_SETUID)) {
> bprm->e_uid = current->uid;
> bprm->e_gid = current->gid;
> }
> }
> }
>
> and fs/exec.c:
>
> void compute_creds(struct linux_binprm *bprm) {
> int unsafe;
>
> unsafe = unsafe_exec(current);
> security_bprm_apply_creds(bprm, unsafe);
> }
>
> static inline int unsafe_exec(struct task_struct *p) {
> int unsafe = 0;
> if (p->ptrace & PT_PTRACED) {
> if (p->ptrace & PT_PTRACE_CAP)
> unsafe |= LSM_UNSAFE_PTRACE_CAP;
> else
> unsafe |= LSM_UNSAFE_PTRACE;
> }
> return unsafe;
> }
>
> That is: if the process that calls execve() is being traced,
> the LSM_UNSAFE_PTRACE bit is et in unsafe and security_bprm_apply_creds()
> will make sure the suid/sgid bits are ignored.
>
> ---
>
> In my man page I do not read anything like that. It says
>
> EPERM The process is being traced, the user is not the superuser and
> the file has an SUID or SGID bit set.
> and
>
> If the current program is being ptraced, a SIGTRAP is sent to it after
> a successful execve().
>
> If the set-uid bit is set on the program file pointed to by filename
> the effective user ID of the calling process is changed to that of the
> owner of the program file.
>
> So, maybe this sentence should be amended to read
>
> If the set-uid bit is set on the program file pointed to by filename
> and the current process is not being ptraced, the effective user ID
> of the calling process is changed to ...
I changed your "current" to "calling" (to be consistent with the
rest of the page), but otherwise applied as you suggest.
The revision will appear in man-pages-2.03, which I can release
any time now. Are you avialable to do an upload tomorrow?
Added text on permissions required to send signal to owner.
====
Hello Johannes,
> Betreff: Inaccuracy of fcntl man page
> Datum: Mon, 2 May 2005 20:07:12 +0200
Thanks for yor note.
Sorry for the delay in getting back to you. I needed to find time
to set aside to look at the details. Now I've finally got there.
> I have attached a simple program
Thanks -- a little program is always helpful.
> that uses the fcntl system call in order
> to kill an arbitrary process of the same user.
> According to the fcntl man page, fcntl(fd,F_SETOWN,pid) returns zero if
> it has success.
Yes.
> If you strace the program while killing for exampe man running in another
> terminal, you will see that man is killed, but fcntl(fd,F_SETOWN,pid)
> will return EPERM,
I confirm that I see this problem in 2.4, with both Unix domain
and Internet domain sockets.
> where you can only find a very confusing explanation
> in the fcntl man page.
I'm not sure what explanation you mean here. As far as I can
tell, the manual page just doesn't cover this point.
> I have looked into the kernel source of 2.4.30 and found out, that
> net/core/socket::sock_no_fcntl is the culprit if you use fcntl on Unix
> sockets.
Yes, looks that way to me, as well, And the 2.2 code looks
similar.
> If pid is not your own pid or not your own process group,
> the system call will return EPERM but will also set the pid
> as you wanted to.
Yes.
> In the 2.6 kernel line, fcntl will react according the specification in
> the manual page.
Yes.
> If you also think, that one should clarify the return specification of
> fcntl(fd,F_SETOWN,pid) or 2.4.x kernels, please tell me and I will
> provide you with a patch for the manual page.
In fact I've written some new text under BUGS, which describes
the problem:
In Linux 2.4 and earlier, there is bug that can occur when an
unprivileged process uses F_SETOWN to specify the owner of a
socket file descriptor as a process (group) other than the
caller. In this case, fcntl() can return -1 with errno set to
EPERM, even when the owner process (group) is one that the
caller has permission to send signals to. Despite this error
return, the file descriptor owner is set, and signals will be
sent to the owner.
Does that seem okay to you?
> Furthermore, it would be interseting to write there, what permissions
> one need in order to send signals to processes via fcntl
Good idea. I added the following new text:
Sending a signal to the owner process (group) specified by
F_SETOWN is subject to the same permissions checks as are
described for kill(2), where the sending process is the one that
employs F_SETOWN (but see BUGS below).
====
#define _GNU_SOURCE /* needed to get the defines */
#include <fcntl.h> /* in glibc 2.2 this has the needed
values defined */
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
/**
* Funnykill kills a program with fcntl
**/
int
main (int argc, char **argv)
{
if (argc != 2)
{
fprintf (stderr, "Usage: funnykill <pid>\n");
return 1;
}
int sockets[2];
socketpair (AF_UNIX, SOCK_STREAM, 0, sockets);
if (fcntl (sockets[0], F_SETFL, O_ASYNC | O_NONBLOCK) == -1)
errMsg("fcntl-F_SETFL");
if (fcntl (sockets[0], F_SETOWN, atoi (argv[1])) == -1)
errMsg("fcntl-F_SETOWN");
// fcntl (sockets[0], F_SETOWN, getpid());
if (fcntl (sockets[0], F_SETSIG, SIGKILL) == -1)
errMsg("fcntl-_FSETSIG");
write (sockets[1], "good bye", 9);
}
.\" For Unix domain sockets and regular files, EPERM is only returned in
.\" Linux 2.2 and earlier; in Linux 2.4 and later, unprivileged can
.\" use mknod() to make these files.