2014-04-28 10:25:55 +00:00
|
|
|
.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
|
2014-05-12 10:07:58 +00:00
|
|
|
.\" and Copyright (C) 2014 Peter Zijlstra <peterz@infradead.org>
|
|
|
|
.\" and Copyright (C) 2014 Juri Lelli <juri.lelli@gmail.com>
|
2014-04-28 10:25:55 +00:00
|
|
|
.\" Various pieces from the old sched_setscheduler(2) page
|
|
|
|
.\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
|
|
|
|
.\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
|
|
|
|
.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
|
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
|
|
|
|
.\" This is free documentation; you can redistribute it and/or
|
|
|
|
.\" modify it under the terms of the GNU General Public License as
|
|
|
|
.\" published by the Free Software Foundation; either version 2 of
|
|
|
|
.\" the License, or (at your option) any later version.
|
|
|
|
.\"
|
|
|
|
.\" The GNU General Public License's references to "object code"
|
|
|
|
.\" and "executables" are to be interpreted as the output of any
|
|
|
|
.\" document formatting or typesetting system, including
|
|
|
|
.\" intermediate and printed output.
|
|
|
|
.\"
|
|
|
|
.\" This manual is distributed in the hope that it will be useful,
|
|
|
|
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
.\" GNU General Public License for more details.
|
|
|
|
.\"
|
|
|
|
.\" You should have received a copy of the GNU General Public
|
|
|
|
.\" License along with this manual; if not, see
|
|
|
|
.\" <http://www.gnu.org/licenses/>.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
|
|
|
.\" Worth looking at: http://rt.wiki.kernel.org/index.php
|
|
|
|
.\"
|
ldd.1, execve.2, fanotify_init.2, fanotify_mark.2, getrlimit.2, open.2, readlink.2, sched_setattr.2, sched_setscheduler.2, shmget.2, syscalls.2, vmsplice.2, dlopen.3, fseeko.3, getgrent.3, mq_getattr.3, mq_open.3, realpath.3, armscii-8.7, ascii.7, iso_8859-1.7, iso_8859-10.7, iso_8859-11.7, iso_8859-13.7, iso_8859-14.7, iso_8859-15.7, iso_8859-16.7, iso_8859-2.7, iso_8859-3.7, iso_8859-4.7, iso_8859-5.7, iso_8859-6.7, iso_8859-7.7, iso_8859-8.7, iso_8859-9.7, koi8-r.7, koi8-u.7, sched.7, ld.so.8: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2014-10-02 21:47:51 +00:00
|
|
|
.TH SCHED 7 2014-10-02 "Linux" "Linux Programmer's Manual"
|
2014-04-28 10:25:55 +00:00
|
|
|
.SH NAME
|
|
|
|
sched \- overview of scheduling APIs
|
|
|
|
.SH DESCRIPTION
|
2014-04-28 10:27:56 +00:00
|
|
|
.SS API summary
|
|
|
|
The Linux scheduling APIs are as follows:
|
|
|
|
.TP
|
|
|
|
.BR sched_setscheduler (2)
|
|
|
|
Set the scheduling policy and parameters of a specified thread.
|
|
|
|
.TP
|
|
|
|
.BR sched_getscheduler (2)
|
|
|
|
Return the scheduling policy of a specified thread.
|
|
|
|
.TP
|
|
|
|
.BR sched_setparam (2)
|
|
|
|
Set the scheduling parameters of a specified thread.
|
|
|
|
.TP
|
|
|
|
.BR sched_getparam (2)
|
|
|
|
Fetch the scheduling parameters of a specified thread.
|
|
|
|
.TP
|
|
|
|
.BR sched_get_priority_max (2)
|
|
|
|
Return the minimum priority available in a specified scheduling policy.
|
|
|
|
.TP
|
|
|
|
.BR sched_get_priority_min (2)
|
|
|
|
Return the maximum priority available in a specified scheduling policy.
|
|
|
|
.TP
|
2014-05-10 05:23:15 +00:00
|
|
|
.BR sched_rr_get_interval (2)
|
2014-04-28 10:27:56 +00:00
|
|
|
Fetch the quantum used for threads that are scheduled under
|
|
|
|
the "round-robin" scheduling policy.
|
|
|
|
.TP
|
|
|
|
.BR sched_yield (2)
|
|
|
|
Cause the caller to relinquish the CPU,
|
|
|
|
so that some other thread be executed.
|
|
|
|
.TP
|
|
|
|
.BR sched_setaffinity (2)
|
|
|
|
(Linux-specific)
|
|
|
|
Set the CPU affinity of a specified thread.
|
|
|
|
.TP
|
|
|
|
.BR sched_getaffinity (2)
|
|
|
|
(Linux-specific)
|
2014-05-10 14:27:15 +00:00
|
|
|
Get the CPU affinity of a specified thread.
|
2014-04-28 10:27:56 +00:00
|
|
|
.TP
|
|
|
|
.BR sched_setattr (2)
|
2014-05-12 07:21:37 +00:00
|
|
|
Set the scheduling policy and parameters of a specified thread.
|
|
|
|
This (Linux-specific) system call provides a superset of the functionality of
|
|
|
|
.BR sched_setscheduler (2)
|
|
|
|
and
|
|
|
|
.BR sched_setparam (2).
|
2014-04-28 10:27:56 +00:00
|
|
|
.TP
|
|
|
|
.BR sched_getattr (2)
|
2014-05-12 07:21:37 +00:00
|
|
|
Fetch the scheduling policy and parameters of a specified thread.
|
|
|
|
This (Linux-specific) system call provides a superset of the functionality of
|
|
|
|
.BR sched_getscheduler (2)
|
|
|
|
and
|
|
|
|
.BR sched_getparam (2).
|
2014-04-28 10:27:56 +00:00
|
|
|
.\"
|
2014-04-28 10:25:55 +00:00
|
|
|
.SS Scheduling policies
|
|
|
|
The scheduler is the kernel component that decides which runnable thread
|
|
|
|
will be executed by the CPU next.
|
|
|
|
Each thread has an associated scheduling policy and a \fIstatic\fP
|
2014-05-12 07:24:22 +00:00
|
|
|
scheduling priority,
|
|
|
|
.IR sched_priority .
|
2014-05-12 07:17:41 +00:00
|
|
|
The scheduler makes its decisions based on knowledge of the scheduling
|
2014-04-28 10:25:55 +00:00
|
|
|
policy and static priority of all threads on the system.
|
|
|
|
|
|
|
|
For threads scheduled under one of the normal scheduling policies
|
|
|
|
(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
|
|
|
|
\fIsched_priority\fP is not used in scheduling
|
|
|
|
decisions (it must be specified as 0).
|
|
|
|
|
|
|
|
Processes scheduled under one of the real-time policies
|
|
|
|
(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
|
|
|
|
\fIsched_priority\fP value in the range 1 (low) to 99 (high).
|
|
|
|
(As the numbers imply, real-time threads always have higher priority
|
|
|
|
than normal threads.)
|
|
|
|
Note well: POSIX.1-2001 requires an implementation to support only a
|
|
|
|
minimum 32 distinct priority levels for the real-time policies,
|
|
|
|
and some systems supply just this minimum.
|
|
|
|
Portable programs should use
|
|
|
|
.BR sched_get_priority_min (2)
|
|
|
|
and
|
|
|
|
.BR sched_get_priority_max (2)
|
|
|
|
to find the range of priorities supported for a particular policy.
|
|
|
|
|
|
|
|
Conceptually, the scheduler maintains a list of runnable
|
|
|
|
threads for each possible \fIsched_priority\fP value.
|
|
|
|
In order to determine which thread runs next, the scheduler looks for
|
|
|
|
the nonempty list with the highest static priority and selects the
|
|
|
|
thread at the head of this list.
|
|
|
|
|
|
|
|
A thread's scheduling policy determines
|
|
|
|
where it will be inserted into the list of threads
|
|
|
|
with equal static priority and how it will move inside this list.
|
|
|
|
|
|
|
|
All scheduling is preemptive: if a thread with a higher static
|
|
|
|
priority becomes ready to run, the currently running thread
|
|
|
|
will be preempted and
|
|
|
|
returned to the wait list for its static priority level.
|
|
|
|
The scheduling policy determines the
|
|
|
|
ordering only within the list of runnable threads with equal static
|
|
|
|
priority.
|
|
|
|
.SS SCHED_FIFO: First in-first out scheduling
|
|
|
|
\fBSCHED_FIFO\fP can be used only with static priorities higher than
|
|
|
|
0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable,
|
|
|
|
it will always immediately preempt any currently running
|
|
|
|
\fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
|
|
|
|
\fBSCHED_FIFO\fP is a simple scheduling
|
|
|
|
algorithm without time slicing.
|
|
|
|
For threads scheduled under the
|
|
|
|
\fBSCHED_FIFO\fP policy, the following rules apply:
|
|
|
|
.IP * 3
|
|
|
|
A \fBSCHED_FIFO\fP thread that has been preempted by another thread of
|
|
|
|
higher priority will stay at the head of the list for its priority and
|
|
|
|
will resume execution as soon as all threads of higher priority are
|
|
|
|
blocked again.
|
|
|
|
.IP *
|
|
|
|
When a \fBSCHED_FIFO\fP thread becomes runnable, it
|
|
|
|
will be inserted at the end of the list for its priority.
|
|
|
|
.IP *
|
|
|
|
A call to
|
2014-05-12 07:28:26 +00:00
|
|
|
.BR sched_setscheduler (2),
|
|
|
|
.BR sched_setparam (2),
|
2014-04-28 10:25:55 +00:00
|
|
|
or
|
2014-05-12 07:28:26 +00:00
|
|
|
.BR sched_setattr (2)
|
2014-04-28 10:25:55 +00:00
|
|
|
will put the
|
|
|
|
\fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by
|
|
|
|
\fIpid\fP at the start of the list if it was runnable.
|
|
|
|
As a consequence, it may preempt the currently running thread if
|
|
|
|
it has the same priority.
|
|
|
|
(POSIX.1-2001 specifies that the thread should go to the end
|
|
|
|
of the list.)
|
|
|
|
.\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue
|
|
|
|
.\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK
|
|
|
|
.IP *
|
|
|
|
A thread calling
|
|
|
|
.BR sched_yield (2)
|
|
|
|
will be put at the end of the list.
|
|
|
|
.PP
|
|
|
|
No other events will move a thread
|
|
|
|
scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
|
|
|
|
runnable threads with equal static priority.
|
|
|
|
|
|
|
|
A \fBSCHED_FIFO\fP
|
|
|
|
thread runs until either it is blocked by an I/O request, it is
|
|
|
|
preempted by a higher priority thread, or it calls
|
|
|
|
.BR sched_yield (2).
|
|
|
|
.SS SCHED_RR: Round-robin scheduling
|
|
|
|
\fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
|
|
|
|
Everything
|
|
|
|
described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
|
|
|
|
except that each thread is allowed to run only for a maximum time
|
|
|
|
quantum.
|
|
|
|
If a \fBSCHED_RR\fP thread has been running for a time
|
|
|
|
period equal to or longer than the time quantum, it will be put at the
|
|
|
|
end of the list for its priority.
|
|
|
|
A \fBSCHED_RR\fP thread that has
|
|
|
|
been preempted by a higher priority thread and subsequently resumes
|
|
|
|
execution as a running thread will complete the unexpired portion of
|
|
|
|
its round-robin time quantum.
|
|
|
|
The length of the time quantum can be
|
|
|
|
retrieved using
|
|
|
|
.BR sched_rr_get_interval (2).
|
|
|
|
.\" On Linux 2.4, the length of the RR interval is influenced
|
|
|
|
.\" by the process nice value -- MTK
|
|
|
|
.\"
|
2014-05-12 10:07:58 +00:00
|
|
|
.SS SCHED_DEADLINE: Sporadic task model deadline scheduling
|
2014-05-12 13:18:43 +00:00
|
|
|
Since version 3.14, Linux provides a deadline scheduling policy
|
|
|
|
.RB ( SCHED_DEADLINE ).
|
|
|
|
This policy is currently implemented using
|
|
|
|
GEDF (Global Earliest Deadline First)
|
|
|
|
in conjunction with CBS (Constant Bandwidth Server).
|
|
|
|
To set and fetch this policy and associated attributes,
|
|
|
|
one must use the Linux-specific
|
|
|
|
.BR sched_setattr (2)
|
|
|
|
and
|
|
|
|
.BR sched_getattr (2)
|
|
|
|
system calls.
|
2014-05-12 10:07:58 +00:00
|
|
|
|
|
|
|
A sporadic task is one that has a sequence of jobs, where each
|
2014-05-12 10:30:49 +00:00
|
|
|
job is activated at most once per period.
|
2014-05-12 13:18:43 +00:00
|
|
|
Each job also has a
|
|
|
|
.IR "relative deadline" ,
|
2014-05-12 10:30:49 +00:00
|
|
|
before which it should finish execution, and a
|
2014-05-12 13:18:43 +00:00
|
|
|
.IR "computation time" ,
|
|
|
|
which is the CPU time necessary for executing the job.
|
|
|
|
The moment when a task wakes up
|
|
|
|
because a new job has to be executed is called the
|
|
|
|
.IR "arrival time"
|
|
|
|
(also referred to as the request time or release time).
|
|
|
|
The
|
|
|
|
.IR "start time"
|
|
|
|
is the time at which a task starts its execution.
|
|
|
|
The
|
2014-05-21 11:16:14 +00:00
|
|
|
.I "absolute deadline"
|
2014-05-12 13:18:43 +00:00
|
|
|
is thus obtained by adding the relative deadline to the arrival time.
|
2014-05-12 10:07:58 +00:00
|
|
|
|
|
|
|
The following diagram clarifies these terms:
|
|
|
|
|
2014-05-12 10:30:49 +00:00
|
|
|
.in +4n
|
2014-05-12 10:07:58 +00:00
|
|
|
.nf
|
2014-05-13 17:55:13 +00:00
|
|
|
arrival/wakeup absolute deadline
|
|
|
|
| start time |
|
|
|
|
| | |
|
|
|
|
v v v
|
|
|
|
-----x--------xooooooooooooooooo--------x--------x---
|
2014-05-12 10:31:29 +00:00
|
|
|
|<- comp. time ->|
|
2014-05-13 17:55:13 +00:00
|
|
|
|<------- relative deadline ------>|
|
|
|
|
|<-------------- period ------------------->|
|
2014-05-12 10:07:58 +00:00
|
|
|
.fi
|
2014-05-12 10:30:49 +00:00
|
|
|
.in
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 13:18:43 +00:00
|
|
|
When setting a
|
2014-05-12 10:30:49 +00:00
|
|
|
.B SCHED_DEADLINE
|
2014-05-12 13:18:43 +00:00
|
|
|
policy for a thread using
|
|
|
|
.BR sched_setattr (2),
|
|
|
|
one can specify three parameters:
|
|
|
|
.IR Runtime ,
|
|
|
|
.IR Deadline ,
|
|
|
|
and
|
|
|
|
.IR Period .
|
|
|
|
These parameters do not necessarily correspond to the aforementioned terms:
|
|
|
|
usual practice is to set Runtime to something bigger than the average
|
|
|
|
computation time (or worst-case execution time for hard real-time tasks),
|
|
|
|
Deadline to the relative deadline, and Period to the period of the task.
|
|
|
|
Thus, for
|
|
|
|
.BR SCHED_DEADLINE
|
|
|
|
scheduling, we have:
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 10:30:49 +00:00
|
|
|
.in +4n
|
2014-05-12 10:07:58 +00:00
|
|
|
.nf
|
2014-05-13 17:55:13 +00:00
|
|
|
arrival/wakeup absolute deadline
|
|
|
|
| start time |
|
|
|
|
| | |
|
|
|
|
v v v
|
|
|
|
-----x--------xooooooooooooooooo--------x--------x---
|
|
|
|
|<-- Runtime ------->|
|
|
|
|
|<----------- Deadline ----------->|
|
|
|
|
|<-------------- Period ------------------->|
|
2014-05-12 10:07:58 +00:00
|
|
|
.fi
|
2014-05-12 10:30:49 +00:00
|
|
|
.in
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 13:18:43 +00:00
|
|
|
The three deadline-scheduling parameters correspond to the
|
|
|
|
.IR sched_runtime ,
|
|
|
|
.IR sched_deadline ,
|
|
|
|
and
|
|
|
|
.IR sched_period
|
|
|
|
fields of the
|
|
|
|
.I sched_attr
|
|
|
|
structure; see
|
|
|
|
.BR sched_setattr (2).
|
|
|
|
These fields express value in nanoseconds.
|
|
|
|
.\" FIXME It looks as though specifying sched_period as 0 means
|
adjtimex.2, bind.2, cacheflush.2, clone.2, fallocate.2, fanotify_init.2, fanotify_mark.2, flock.2, futex.2, getdents.2, getpriority.2, getrlimit.2, gettid.2, gettimeofday.2, ioprio_set.2, kexec_load.2, migrate_pages.2, modify_ldt.2, mount.2, move_pages.2, mprotect.2, msgop.2, nfsservctl.2, perf_event_open.2, pread.2, ptrace.2, recvmmsg.2, rename.2, restart_syscall.2, sched_setattr.2, send.2, shmop.2, shutdown.2, sigaction.2, signalfd.2, syscalls.2, timer_create.2, timerfd_create.2, tkill.2, vmsplice.2, wait.2, aio_init.3, confstr.3, exit.3, fmemopen.3, fopen.3, getaddrinfo.3, getauxval.3, getspnam.3, isalpha.3, isatty.3, mallinfo.3, malloc.3, mallopt.3, psignal.3, pthread_attr_setinheritsched.3, qecvt.3, queue.3, rtnetlink.3, strerror.3, strftime.3, toupper.3, towlower.3, towupper.3, initrd.4, locale.5, proc.5, bootparam.7, capabilities.7, ddp.7, fanotify.7, icmp.7, inotify.7, ip.7, ipv6.7, netdevice.7, netlink.7, path_resolution.7, rtld-audit.7, rtnetlink.7, sched.7, signal.7, socket.7, svipc.7, tcp.7, unix.7, ld.so.8: srcfix: Update FIXMEs
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2014-08-21 21:47:44 +00:00
|
|
|
.\" "make sched_period the same as sched_deadline".
|
2014-05-12 13:18:43 +00:00
|
|
|
.\" This needs to be documented.
|
|
|
|
If
|
|
|
|
.IR sched_period
|
|
|
|
is specified as 0, then it is made the same as
|
|
|
|
.IR sched_deadline .
|
|
|
|
|
|
|
|
The kernel requires that:
|
|
|
|
|
|
|
|
sched_runtime <= sched_deadline <= sched_period
|
|
|
|
|
|
|
|
.\" See __checkparam_dl in kernel/sched/core.c
|
|
|
|
In addition, under the current implementation,
|
|
|
|
all of the parameter values must be at least 1024
|
|
|
|
(i.e., just over one microsecond,
|
2014-05-13 17:55:13 +00:00
|
|
|
which is the resolution of the implementation), and less than 2^63.
|
2014-05-12 13:18:43 +00:00
|
|
|
If any of these checks fails,
|
|
|
|
.BR sched_setattr (2)
|
|
|
|
fails with the error
|
|
|
|
.BR EINVAL .
|
2014-05-12 10:07:58 +00:00
|
|
|
|
|
|
|
The CBS guarantees non-interference between tasks, by throttling
|
2014-05-12 13:18:43 +00:00
|
|
|
threads that attempt to over-run their specified Runtime.
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 13:18:43 +00:00
|
|
|
To ensure deadline scheduling guarantees,
|
2014-05-21 11:16:14 +00:00
|
|
|
the kernel must prevent situations where the set of
|
2014-05-12 10:30:49 +00:00
|
|
|
.B SCHED_DEADLINE
|
2014-05-12 13:18:43 +00:00
|
|
|
threads is not feasible (schedulable) within the given constraints.
|
|
|
|
The kernel thus performs an admittance test when setting or changing
|
2014-05-12 10:30:49 +00:00
|
|
|
.B SCHED_DEADLINE
|
2014-05-12 13:18:43 +00:00
|
|
|
policy and attributes.
|
|
|
|
This admission test calculates whether the change is feasible;
|
|
|
|
if it is not
|
2014-05-12 10:30:49 +00:00
|
|
|
.BR sched_setattr (2)
|
2014-05-12 13:18:43 +00:00
|
|
|
fails with the error
|
|
|
|
.BR EBUSY .
|
2014-05-12 10:07:58 +00:00
|
|
|
|
|
|
|
For example, it is required (but not necessarily sufficient) for
|
2014-05-12 13:18:43 +00:00
|
|
|
the total utilization to be less than or equal to the total number of
|
|
|
|
CPUs available, where, since each thread can maximally run for
|
|
|
|
Runtime per Period, that thread's utilization is its
|
|
|
|
Runtime divided by its Period.
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 13:18:43 +00:00
|
|
|
In order to fulfil the guarantees that are made when
|
|
|
|
a thread is admitted to the
|
|
|
|
.BR SCHED_DEADLINE
|
|
|
|
policy,
|
2014-05-12 10:30:49 +00:00
|
|
|
.BR SCHED_DEADLINE
|
2014-05-12 13:18:43 +00:00
|
|
|
threads are the highest priority (user controllable) threads in the
|
|
|
|
system; if any
|
2014-05-12 10:30:49 +00:00
|
|
|
.BR SCHED_DEADLINE
|
2014-05-12 13:18:43 +00:00
|
|
|
thread is runnable,
|
|
|
|
it will preempt any thread scheduled under one of the other policies.
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 13:18:43 +00:00
|
|
|
A call to
|
2014-05-12 10:30:49 +00:00
|
|
|
.BR fork (2)
|
2014-05-12 13:18:43 +00:00
|
|
|
by a thread scheduled under the
|
|
|
|
.B SCHED_DEADLINE
|
|
|
|
policy will fail with the error
|
|
|
|
.BR EAGAIN ,
|
|
|
|
unless the thread has its reset-on-fork flag set (see below).
|
2014-05-12 10:07:58 +00:00
|
|
|
|
2014-05-12 10:30:49 +00:00
|
|
|
A
|
|
|
|
.B SCHED_DEADLINE
|
2014-05-12 13:18:43 +00:00
|
|
|
thread that calls
|
2014-05-12 10:30:49 +00:00
|
|
|
.BR sched_yield (2)
|
2014-05-12 13:18:43 +00:00
|
|
|
will yield the current job and wait for a new period to begin.
|
|
|
|
.\"
|
|
|
|
.\" FIXME Calling sched_getparam() on a SCHED_DEADLINE thread
|
|
|
|
.\" fails with EINVAL, but sched_getscheduler() succeeds.
|
|
|
|
.\" Is that intended? (Why?)
|
2014-05-12 10:30:49 +00:00
|
|
|
.\"
|
2014-04-28 10:25:55 +00:00
|
|
|
.SS SCHED_OTHER: Default Linux time-sharing scheduling
|
|
|
|
\fBSCHED_OTHER\fP can be used at only static priority 0.
|
|
|
|
\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
|
|
|
|
intended for all threads that do not require the special
|
|
|
|
real-time mechanisms.
|
|
|
|
The thread to run is chosen from the static
|
|
|
|
priority 0 list based on a \fIdynamic\fP priority that is determined only
|
|
|
|
inside this list.
|
|
|
|
The dynamic priority is based on the nice value (set by
|
2014-05-12 07:52:50 +00:00
|
|
|
.BR nice (2),
|
|
|
|
.BR setpriority (2),
|
2014-04-28 10:25:55 +00:00
|
|
|
or
|
2014-05-12 07:52:50 +00:00
|
|
|
.BR sched_setattr (2))
|
2014-04-28 10:25:55 +00:00
|
|
|
and increased for each time quantum the thread is ready to run,
|
|
|
|
but denied to run by the scheduler.
|
|
|
|
This ensures fair progress among all \fBSCHED_OTHER\fP threads.
|
|
|
|
.\"
|
|
|
|
.SS SCHED_BATCH: Scheduling batch processes
|
|
|
|
(Since Linux 2.6.16.)
|
|
|
|
\fBSCHED_BATCH\fP can be used only at static priority 0.
|
|
|
|
This policy is similar to \fBSCHED_OTHER\fP in that it schedules
|
|
|
|
the thread according to its dynamic priority
|
|
|
|
(based on the nice value).
|
|
|
|
The difference is that this policy
|
|
|
|
will cause the scheduler to always assume
|
|
|
|
that the thread is CPU-intensive.
|
|
|
|
Consequently, the scheduler will apply a small scheduling
|
2014-06-16 20:25:04 +00:00
|
|
|
penalty with respect to wakeup behavior,
|
2014-04-28 10:25:55 +00:00
|
|
|
so that this thread is mildly disfavored in scheduling decisions.
|
|
|
|
|
|
|
|
.\" The following paragraph is drawn largely from the text that
|
|
|
|
.\" accompanied Ingo Molnar's patch for the implementation of
|
|
|
|
.\" SCHED_BATCH.
|
|
|
|
.\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
|
|
|
|
This policy is useful for workloads that are noninteractive,
|
|
|
|
but do not want to lower their nice value,
|
|
|
|
and for workloads that want a deterministic scheduling policy without
|
|
|
|
interactivity causing extra preemptions (between the workload's tasks).
|
|
|
|
.\"
|
|
|
|
.SS SCHED_IDLE: Scheduling very low priority jobs
|
|
|
|
(Since Linux 2.6.23.)
|
|
|
|
\fBSCHED_IDLE\fP can be used only at static priority 0;
|
|
|
|
the process nice value has no influence for this policy.
|
|
|
|
|
|
|
|
This policy is intended for running jobs at extremely low
|
|
|
|
priority (lower even than a +19 nice value with the
|
|
|
|
.B SCHED_OTHER
|
|
|
|
or
|
|
|
|
.B SCHED_BATCH
|
|
|
|
policies).
|
|
|
|
.\"
|
|
|
|
.SS Resetting scheduling policy for child processes
|
2014-05-10 05:22:47 +00:00
|
|
|
Each thread has a reset-on-fork scheduling flag.
|
|
|
|
When this flag is set, children created by
|
|
|
|
.BR fork (2)
|
|
|
|
do not inherit privileged scheduling policies.
|
|
|
|
The reset-on-fork flag can be set by either:
|
|
|
|
.IP * 3
|
|
|
|
ORing the
|
2014-04-28 10:25:55 +00:00
|
|
|
.B SCHED_RESET_ON_FORK
|
2014-05-10 05:22:47 +00:00
|
|
|
flag into the
|
2014-04-28 10:25:55 +00:00
|
|
|
.I policy
|
2014-05-10 05:22:47 +00:00
|
|
|
argument when calling
|
|
|
|
.BR sched_setscheduler (2)
|
|
|
|
(since Linux 2.6.32);
|
|
|
|
or
|
|
|
|
.IP *
|
|
|
|
specifying the
|
|
|
|
.B SCHED_FLAG_RESET_ON_FORK
|
|
|
|
flag in
|
|
|
|
.IR attr.sched_flags
|
2014-04-28 10:25:55 +00:00
|
|
|
when calling
|
2014-05-10 05:22:47 +00:00
|
|
|
.BR sched_setattr (2).
|
|
|
|
.PP
|
|
|
|
Note that the constants used with these two APIs have different names.
|
|
|
|
The state of the reset-on-fork flag can analogously be retrieved using
|
|
|
|
.BR sched_getscheduler (2)
|
|
|
|
and
|
|
|
|
.BR sched_getattr (2).
|
|
|
|
|
|
|
|
The reset-on-fork feature is intended for media-playback applications,
|
2014-04-28 10:25:55 +00:00
|
|
|
and can be used to prevent applications evading the
|
|
|
|
.BR RLIMIT_RTTIME
|
|
|
|
resource limit (see
|
|
|
|
.BR getrlimit (2))
|
|
|
|
by creating multiple child processes.
|
|
|
|
|
2014-05-10 05:22:47 +00:00
|
|
|
More precisely, if the reset-on-fork flag is set,
|
2014-04-28 10:25:55 +00:00
|
|
|
the following rules apply for subsequently created children:
|
|
|
|
.IP * 3
|
|
|
|
If the calling thread has a scheduling policy of
|
|
|
|
.B SCHED_FIFO
|
|
|
|
or
|
|
|
|
.BR SCHED_RR ,
|
|
|
|
the policy is reset to
|
|
|
|
.BR SCHED_OTHER
|
|
|
|
in child processes.
|
|
|
|
.IP *
|
|
|
|
If the calling process has a negative nice value,
|
|
|
|
the nice value is reset to zero in child processes.
|
|
|
|
.PP
|
2014-05-10 05:22:47 +00:00
|
|
|
After the reset-on-fork flag has been enabled,
|
2014-04-28 10:25:55 +00:00
|
|
|
it can be reset only if the thread has the
|
|
|
|
.BR CAP_SYS_NICE
|
|
|
|
capability.
|
|
|
|
This flag is disabled in child processes created by
|
|
|
|
.BR fork (2).
|
|
|
|
.\"
|
|
|
|
.SS Privileges and resource limits
|
|
|
|
In Linux kernels before 2.6.12, only privileged
|
|
|
|
.RB ( CAP_SYS_NICE )
|
|
|
|
threads can set a nonzero static priority (i.e., set a real-time
|
|
|
|
scheduling policy).
|
|
|
|
The only change that an unprivileged thread can make is to set the
|
|
|
|
.B SCHED_OTHER
|
2014-05-12 07:30:25 +00:00
|
|
|
policy, and this can be done only if the effective user ID of the caller
|
2014-04-28 10:25:55 +00:00
|
|
|
matches the real or effective user ID of the target thread
|
|
|
|
(i.e., the thread specified by
|
|
|
|
.IR pid )
|
|
|
|
whose policy is being changed.
|
|
|
|
|
2014-05-12 13:18:43 +00:00
|
|
|
A thread must be privileged
|
|
|
|
.RB ( CAP_SYS_NICE )
|
2014-05-21 11:16:14 +00:00
|
|
|
in order to set or modify a
|
2014-05-12 13:18:43 +00:00
|
|
|
.BR SCHED_DEADLINE
|
|
|
|
policy.
|
|
|
|
|
2014-04-28 10:25:55 +00:00
|
|
|
Since Linux 2.6.12, the
|
|
|
|
.B RLIMIT_RTPRIO
|
|
|
|
resource limit defines a ceiling on an unprivileged thread's
|
|
|
|
static priority for the
|
|
|
|
.B SCHED_RR
|
|
|
|
and
|
|
|
|
.B SCHED_FIFO
|
|
|
|
policies.
|
|
|
|
The rules for changing scheduling policy and priority are as follows:
|
|
|
|
.IP * 3
|
|
|
|
If an unprivileged thread has a nonzero
|
|
|
|
.B RLIMIT_RTPRIO
|
|
|
|
soft limit, then it can change its scheduling policy and priority,
|
|
|
|
subject to the restriction that the priority cannot be set to a
|
|
|
|
value higher than the maximum of its current priority and its
|
|
|
|
.B RLIMIT_RTPRIO
|
|
|
|
soft limit.
|
|
|
|
.IP *
|
|
|
|
If the
|
|
|
|
.B RLIMIT_RTPRIO
|
|
|
|
soft limit is 0, then the only permitted changes are to lower the priority,
|
|
|
|
or to switch to a non-real-time policy.
|
|
|
|
.IP *
|
|
|
|
Subject to the same rules,
|
|
|
|
another unprivileged thread can also make these changes,
|
|
|
|
as long as the effective user ID of the thread making the change
|
|
|
|
matches the real or effective user ID of the target thread.
|
|
|
|
.IP *
|
|
|
|
Special rules apply for the
|
2014-05-12 08:10:06 +00:00
|
|
|
.BR SCHED_IDLE
|
|
|
|
policy.
|
2014-04-28 10:25:55 +00:00
|
|
|
In Linux kernels before 2.6.39,
|
|
|
|
an unprivileged thread operating under this policy cannot
|
|
|
|
change its policy, regardless of the value of its
|
|
|
|
.BR RLIMIT_RTPRIO
|
|
|
|
resource limit.
|
|
|
|
In Linux kernels since 2.6.39,
|
|
|
|
.\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
|
|
|
|
an unprivileged thread can switch to either the
|
|
|
|
.BR SCHED_BATCH
|
|
|
|
or the
|
|
|
|
.BR SCHED_NORMAL
|
|
|
|
policy so long as its nice value falls within the range permitted by its
|
|
|
|
.BR RLIMIT_NICE
|
|
|
|
resource limit (see
|
|
|
|
.BR getrlimit (2)).
|
|
|
|
.PP
|
|
|
|
Privileged
|
|
|
|
.RB ( CAP_SYS_NICE )
|
|
|
|
threads ignore the
|
|
|
|
.B RLIMIT_RTPRIO
|
|
|
|
limit; as with older kernels,
|
|
|
|
they can make arbitrary changes to scheduling policy and priority.
|
|
|
|
See
|
|
|
|
.BR getrlimit (2)
|
|
|
|
for further information on
|
|
|
|
.BR RLIMIT_RTPRIO .
|
2014-05-12 09:28:23 +00:00
|
|
|
.SS Limiting the CPU usage of real-time and deadline processes
|
|
|
|
A nonblocking infinite loop in a thread scheduled under the
|
|
|
|
.BR SCHED_FIFO ,
|
|
|
|
.BR SCHED_RR ,
|
|
|
|
or
|
|
|
|
.BR SCHED_DEADLINE
|
|
|
|
policy will block all threads with lower
|
|
|
|
priority forever.
|
|
|
|
Prior to Linux 2.6.25, the only way of preventing a runaway real-time
|
|
|
|
process from freezing the system was to run (at the console)
|
|
|
|
a shell scheduled under a higher static priority than the tested application.
|
|
|
|
This allows an emergency kill of tested
|
|
|
|
real-time applications that do not block or terminate as expected.
|
|
|
|
|
|
|
|
Since Linux 2.6.25, there are other techniques for dealing with runaway
|
|
|
|
real-time and deadline processes.
|
|
|
|
One of these is to use the
|
|
|
|
.BR RLIMIT_RTTIME
|
|
|
|
resource limit to set a ceiling on the CPU time that
|
|
|
|
a real-time process may consume.
|
|
|
|
See
|
|
|
|
.BR getrlimit (2)
|
|
|
|
for details.
|
|
|
|
|
|
|
|
Since version 2.6.25, Linux also provides two
|
|
|
|
.I /proc
|
|
|
|
files that can be used to reserve a certain amount of CPU time
|
|
|
|
to be used by non-real-time processes.
|
|
|
|
Reserving some CPU time in this fashion allows some CPU time to be
|
|
|
|
allocated to (say) a root shell that can be used to kill a runaway process.
|
|
|
|
Both of these files specify time values in microseconds:
|
|
|
|
.TP
|
|
|
|
.IR /proc/sys/kernel/sched_rt_period_us
|
|
|
|
This file specifies a scheduling period that is equivalent to
|
|
|
|
100% CPU bandwidth.
|
|
|
|
The value in this file can range from 1 to
|
|
|
|
.BR INT_MAX ,
|
|
|
|
giving an operating range of 1 microsecond to around 35 minutes.
|
|
|
|
The default value in this file is 1,000,000 (1 second).
|
|
|
|
.TP
|
|
|
|
.IR /proc/sys/kernel/sched_rt_runtime_us
|
|
|
|
The value in this file specifies how much of the "period" time
|
|
|
|
can be used by all real-time and deadline scheduled processes
|
|
|
|
on the system.
|
|
|
|
The value in this file can range from \-1 to
|
|
|
|
.BR INT_MAX \-1.
|
|
|
|
Specifying \-1 makes the runtime the same as the period;
|
|
|
|
that is, no CPU time is set aside for non-real-time processes
|
|
|
|
(which was the Linux behavior before kernel 2.6.25).
|
|
|
|
The default value in this file is 950,000 (0.95 seconds),
|
|
|
|
meaning that 5% of the CPU time is reserved for processes that
|
|
|
|
don't run under a real-time or deadline scheduling policy.
|
|
|
|
.PP
|
2014-04-28 10:25:55 +00:00
|
|
|
.SS Response time
|
2014-05-12 07:05:08 +00:00
|
|
|
A blocked high priority thread waiting for I/O has a certain
|
2014-04-28 10:25:55 +00:00
|
|
|
response time before it is scheduled again.
|
|
|
|
The device driver writer
|
|
|
|
can greatly reduce this response time by using a "slow interrupt"
|
|
|
|
interrupt handler.
|
|
|
|
.\" as described in
|
|
|
|
.\" .BR request_irq (9).
|
|
|
|
.SS Miscellaneous
|
|
|
|
Child processes inherit the scheduling policy and parameters across a
|
|
|
|
.BR fork (2).
|
|
|
|
The scheduling policy and parameters are preserved across
|
|
|
|
.BR execve (2).
|
|
|
|
|
|
|
|
Memory locking is usually needed for real-time processes to avoid
|
|
|
|
paging delays; this can be done with
|
|
|
|
.BR mlock (2)
|
|
|
|
or
|
|
|
|
.BR mlockall (2).
|
|
|
|
.SH NOTES
|
|
|
|
.PP
|
|
|
|
Originally, Standard Linux was intended as a general-purpose operating
|
|
|
|
system being able to handle background processes, interactive
|
|
|
|
applications, and less demanding real-time applications (applications that
|
|
|
|
need to usually meet timing deadlines).
|
|
|
|
Although the Linux kernel 2.6
|
|
|
|
allowed for kernel preemption and the newly introduced O(1) scheduler
|
|
|
|
ensures that the time needed to schedule is fixed and deterministic
|
|
|
|
irrespective of the number of active tasks, true real-time computing
|
|
|
|
was not possible up to kernel version 2.6.17.
|
|
|
|
.SS Real-time features in the mainline Linux kernel
|
|
|
|
.\" FIXME . Probably this text will need some minor tweaking
|
|
|
|
.\" by about the time of 2.6.30; ask Carsten Emde about this then.
|
|
|
|
From kernel version 2.6.18 onward, however, Linux is gradually
|
|
|
|
becoming equipped with real-time capabilities,
|
|
|
|
most of which are derived from the former
|
|
|
|
.I realtime-preempt
|
|
|
|
patches developed by Ingo Molnar, Thomas Gleixner,
|
|
|
|
Steven Rostedt, and others.
|
|
|
|
Until the patches have been completely merged into the
|
|
|
|
mainline kernel
|
|
|
|
(this is expected to be around kernel version 2.6.30),
|
|
|
|
they must be installed to achieve the best real-time performance.
|
|
|
|
These patches are named:
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
|
|
|
|
patch-\fIkernelversion\fP-rt\fIpatchversion\fP
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
.PP
|
|
|
|
and can be downloaded from
|
|
|
|
.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
|
|
|
|
.UE .
|
|
|
|
|
|
|
|
Without the patches and prior to their full inclusion into the mainline
|
|
|
|
kernel, the kernel configuration offers only the three preemption classes
|
|
|
|
.BR CONFIG_PREEMPT_NONE ,
|
|
|
|
.BR CONFIG_PREEMPT_VOLUNTARY ,
|
|
|
|
and
|
|
|
|
.B CONFIG_PREEMPT_DESKTOP
|
|
|
|
which respectively provide no, some, and considerable
|
|
|
|
reduction of the worst-case scheduling latency.
|
|
|
|
|
|
|
|
With the patches applied or after their full inclusion into the mainline
|
|
|
|
kernel, the additional configuration item
|
|
|
|
.B CONFIG_PREEMPT_RT
|
|
|
|
becomes available.
|
|
|
|
If this is selected, Linux is transformed into a regular
|
|
|
|
real-time operating system.
|
2014-05-12 07:30:25 +00:00
|
|
|
The FIFO and RR scheduling policies are then used to run a thread
|
2014-04-28 10:25:55 +00:00
|
|
|
with true real-time priority and a minimum worst-case scheduling latency.
|
|
|
|
.SH SEE ALSO
|
|
|
|
.ad l
|
|
|
|
.nh
|
|
|
|
.BR chrt (1),
|
2014-09-15 09:03:58 +00:00
|
|
|
.BR taskset (1),
|
2014-04-28 10:25:55 +00:00
|
|
|
.BR getpriority (2),
|
|
|
|
.BR mlock (2),
|
|
|
|
.BR mlockall (2),
|
|
|
|
.BR munlock (2),
|
|
|
|
.BR munlockall (2),
|
|
|
|
.BR nice (2),
|
|
|
|
.BR sched_get_priority_max (2),
|
|
|
|
.BR sched_get_priority_min (2),
|
2014-04-28 10:27:34 +00:00
|
|
|
.BR sched_getscheduler (2),
|
2014-04-28 10:25:55 +00:00
|
|
|
.BR sched_getaffinity (2),
|
|
|
|
.BR sched_getparam (2),
|
|
|
|
.BR sched_rr_get_interval (2),
|
|
|
|
.BR sched_setaffinity (2),
|
2014-04-28 10:27:34 +00:00
|
|
|
.BR sched_setscheduler (2),
|
2014-04-28 10:25:55 +00:00
|
|
|
.BR sched_setparam (2),
|
|
|
|
.BR sched_yield (2),
|
|
|
|
.BR setpriority (2),
|
2014-04-28 10:27:34 +00:00
|
|
|
.BR pthread_getaffinity_np (3),
|
|
|
|
.BR pthread_setaffinity_np (3),
|
|
|
|
.BR sched_getcpu (3),
|
2014-04-28 10:25:55 +00:00
|
|
|
.BR capabilities (7),
|
|
|
|
.BR cpuset (7)
|
|
|
|
.ad
|
|
|
|
.PP
|
|
|
|
.I Programming for the real world \- POSIX.4
|
|
|
|
by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
|
|
|
|
.PP
|
2014-05-08 12:38:36 +00:00
|
|
|
The Linux kernel source files
|
|
|
|
.IR Documentation/scheduler/sched-deadline.txt ,
|
|
|
|
.IR Documentation/scheduler/sched-rt-group.txt ,
|
2014-05-12 06:15:36 +00:00
|
|
|
.IR Documentation/scheduler/sched-design-CFS.txt ,
|
2014-05-08 12:38:36 +00:00
|
|
|
and
|
2014-05-12 06:15:50 +00:00
|
|
|
.IR Documentation/scheduler/sched-nice-design.txt
|