mirror of https://github.com/mkerrisk/man-pages
1105 lines
30 KiB
Groff
1105 lines
30 KiB
Groff
.\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\"
|
|
.\" %%%LICENSE_START(verbatim)
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
.\" preserved on all copies.
|
|
.\"
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
.\" permission notice identical to this one.
|
|
.\"
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
.\" have taken the same level of care in the production of this manual,
|
|
.\" which is licensed free of charge, as they might when working
|
|
.\" professionally.
|
|
.\"
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\" 6 Aug 2002 - Initial Creation
|
|
.\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com>
|
|
.\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
|
|
.\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER
|
|
.\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
|
|
.\" 2008-07-15, Serge Hallyn <serue@us.bbm.com>
|
|
.\" Document file capabilities, per-process capability
|
|
.\" bounding set, changed semantics for CAP_SETPCAP,
|
|
.\" and other changes in 2.6.2[45].
|
|
.\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP.
|
|
.\" 2008-07-15, mtk
|
|
.\" Add text describing circumstances in which CAP_SETPCAP
|
|
.\" (theoretically) permits a thread to change the
|
|
.\" capability sets of another thread.
|
|
.\" Add section describing rules for programmatically
|
|
.\" adjusting thread capability sets.
|
|
.\" Describe rationale for capability bounding set.
|
|
.\" Document "securebits" flags.
|
|
.\" Add text noting that if we set the effective flag for one file
|
|
.\" capability, then we must also set the effective flag for all
|
|
.\" other capabilities where the permitted or inheritable bit is set.
|
|
.\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG
|
|
.\"
|
|
.TH CAPABILITIES 7 2013-02-25 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
capabilities \- overview of Linux capabilities
|
|
.SH DESCRIPTION
|
|
For the purpose of performing permission checks,
|
|
traditional UNIX implementations distinguish two categories of processes:
|
|
.I privileged
|
|
processes (whose effective user ID is 0, referred to as superuser or root),
|
|
and
|
|
.I unprivileged
|
|
processes (whose effective UID is nonzero).
|
|
Privileged processes bypass all kernel permission checks,
|
|
while unprivileged processes are subject to full permission
|
|
checking based on the process's credentials
|
|
(usually: effective UID, effective GID, and supplementary group list).
|
|
|
|
Starting with kernel 2.2, Linux divides the privileges traditionally
|
|
associated with superuser into distinct units, known as
|
|
.IR capabilities ,
|
|
which can be independently enabled and disabled.
|
|
Capabilities are a per-thread attribute.
|
|
.\"
|
|
.SS Capabilities list
|
|
The following list shows the capabilities implemented on Linux,
|
|
and the operations or behaviors that each capability permits:
|
|
.TP
|
|
.BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)"
|
|
Enable and disable kernel auditing; change auditing filter rules;
|
|
retrieve auditing status and filtering rules.
|
|
.TP
|
|
.BR CAP_AUDIT_WRITE " (since Linux 2.6.11)"
|
|
Write records to kernel auditing log.
|
|
.TP
|
|
.BR CAP_BLOCK_SUSPEND " (since Linux 3.5)"
|
|
Employ features that can block system suspend
|
|
.RB ( epoll (7)
|
|
.BR EPOLLWAKEUP ,
|
|
.IR /proc/sys/wake_lock ).
|
|
.TP
|
|
.B CAP_CHOWN
|
|
Make arbitrary changes to file UIDs and GIDs (see
|
|
.BR chown (2)).
|
|
.TP
|
|
.B CAP_DAC_OVERRIDE
|
|
Bypass file read, write, and execute permission checks.
|
|
(DAC is an abbreviation of "discretionary access control".)
|
|
.TP
|
|
.B CAP_DAC_READ_SEARCH
|
|
Bypass file read permission checks and
|
|
directory read and execute permission checks.
|
|
.TP
|
|
.B CAP_FOWNER
|
|
.PD 0
|
|
.RS
|
|
.IP * 2
|
|
Bypass permission checks on operations that normally
|
|
require the file system UID of the process to match the UID of
|
|
the file (e.g.,
|
|
.BR chmod (2),
|
|
.BR utime (2)),
|
|
excluding those operations covered by
|
|
.B CAP_DAC_OVERRIDE
|
|
and
|
|
.BR CAP_DAC_READ_SEARCH ;
|
|
.IP *
|
|
set extended file attributes (see
|
|
.BR chattr (1))
|
|
on arbitrary files;
|
|
.IP *
|
|
set Access Control Lists (ACLs) on arbitrary files;
|
|
.IP *
|
|
ignore directory sticky bit on file deletion;
|
|
.IP *
|
|
specify
|
|
.B O_NOATIME
|
|
for arbitrary files in
|
|
.BR open (2)
|
|
and
|
|
.BR fcntl (2).
|
|
.RE
|
|
.PD
|
|
.TP
|
|
.B CAP_FSETID
|
|
Don't clear set-user-ID and set-group-ID permission
|
|
bits when a file is modified;
|
|
set the set-group-ID bit for a file whose GID does not match
|
|
the file system or any of the supplementary GIDs of the calling process.
|
|
.TP
|
|
.B CAP_IPC_LOCK
|
|
.\" FIXME As at Linux 3.2, there are some strange uses of this capability
|
|
.\" in other places; they probably should be replaced with something else.
|
|
Lock memory
|
|
.RB ( mlock (2),
|
|
.BR mlockall (2),
|
|
.BR mmap (2),
|
|
.BR shmctl (2)).
|
|
.TP
|
|
.B CAP_IPC_OWNER
|
|
Bypass permission checks for operations on System V IPC objects.
|
|
.TP
|
|
.B CAP_KILL
|
|
Bypass permission checks for sending signals (see
|
|
.BR kill (2)).
|
|
This includes use of the
|
|
.BR ioctl (2)
|
|
.B KDSIGACCEPT
|
|
operation.
|
|
.\" FIXME CAP_KILL also has an effect for threads + setting child
|
|
.\" termination signal to other than SIGCHLD: without this
|
|
.\" capability, the termination signal reverts to SIGCHLD
|
|
.\" if the child does an exec(). What is the rationale
|
|
.\" for this?
|
|
.TP
|
|
.BR CAP_LEASE " (since Linux 2.4)"
|
|
Establish leases on arbitrary files (see
|
|
.BR fcntl (2)).
|
|
.TP
|
|
.B CAP_LINUX_IMMUTABLE
|
|
Set the
|
|
.B FS_APPEND_FL
|
|
and
|
|
.B FS_IMMUTABLE_FL
|
|
.\" These attributes are now available on ext2, ext3, Reiserfs, XFS, JFS
|
|
i-node flags (see
|
|
.BR chattr (1)).
|
|
.TP
|
|
.BR CAP_MAC_ADMIN " (since Linux 2.6.25)"
|
|
Override Mandatory Access Control (MAC).
|
|
Implemented for the Smack Linux Security Module (LSM).
|
|
.TP
|
|
.BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)"
|
|
Allow MAC configuration or state changes.
|
|
Implemented for the Smack LSM.
|
|
.TP
|
|
.BR CAP_MKNOD " (since Linux 2.4)"
|
|
Create special files using
|
|
.BR mknod (2).
|
|
.TP
|
|
.B CAP_NET_ADMIN
|
|
Perform various network-related operations:
|
|
.PD 0
|
|
.RS
|
|
.IP * 2
|
|
interface configuration;
|
|
.IP *
|
|
administration of IP firewall, masquerading, and accounting
|
|
.IP *
|
|
modify routing tables;
|
|
.IP *
|
|
bind to any address for transparent proxying;
|
|
.IP *
|
|
set type-of-service (TOS)
|
|
.IP *
|
|
clear driver statistics;
|
|
.IP *
|
|
set promiscuous mode;
|
|
.IP *
|
|
enabling multicasting;
|
|
.IP *
|
|
use
|
|
.BR setsockopt (2)
|
|
to set the following socket options:
|
|
.BR SO_DEBUG ,
|
|
.BR SO_MARK ,
|
|
.BR SO_PRIORITY
|
|
(for a priority outside the range 0 to 6),
|
|
.BR SO_RCVBUFFORCE ,
|
|
and
|
|
.BR SO_SNDBUFFORCE .
|
|
.RE
|
|
.PD
|
|
.TP
|
|
.B CAP_NET_BIND_SERVICE
|
|
Bind a socket to Internet domain privileged ports
|
|
(port numbers less than 1024).
|
|
.TP
|
|
.B CAP_NET_BROADCAST
|
|
(Unused) Make socket broadcasts, and listen to multicasts.
|
|
.TP
|
|
.B CAP_NET_RAW
|
|
.PD 0
|
|
.RS
|
|
.IP * 2
|
|
use RAW and PACKET sockets;
|
|
.IP *
|
|
bind to any address for transparent proxying.
|
|
.RE
|
|
.PD
|
|
.\" Also various IP options and setsockopt(SO_BINDTODEVICE)
|
|
.TP
|
|
.B CAP_SETGID
|
|
Make arbitrary manipulations of process GIDs and supplementary GID list;
|
|
forge GID when passing socket credentials via UNIX domain sockets.
|
|
.TP
|
|
.BR CAP_SETFCAP " (since Linux 2.6.24)"
|
|
Set file capabilities.
|
|
.TP
|
|
.B CAP_SETPCAP
|
|
If file capabilities are not supported:
|
|
grant or remove any capability in the
|
|
caller's permitted capability set to or from any other process.
|
|
(This property of
|
|
.B CAP_SETPCAP
|
|
is not available when the kernel is configured to support
|
|
file capabilities, since
|
|
.B CAP_SETPCAP
|
|
has entirely different semantics for such kernels.)
|
|
|
|
If file capabilities are supported:
|
|
add any capability from the calling thread's bounding set
|
|
to its inheritable set;
|
|
drop capabilities from the bounding set (via
|
|
.BR prctl (2)
|
|
.BR PR_CAPBSET_DROP );
|
|
make changes to the
|
|
.I securebits
|
|
flags.
|
|
.TP
|
|
.B CAP_SETUID
|
|
Make arbitrary manipulations of process UIDs
|
|
.RB ( setuid (2),
|
|
.BR setreuid (2),
|
|
.BR setresuid (2),
|
|
.BR setfsuid (2));
|
|
make forged UID when passing socket credentials via UNIX domain sockets.
|
|
.\" FIXME CAP_SETUID also an effect in exec(); document this.
|
|
.TP
|
|
.B CAP_SYS_ADMIN
|
|
.PD 0
|
|
.RS
|
|
.IP * 2
|
|
Perform a range of system administration operations including:
|
|
.BR quotactl (2),
|
|
.BR mount (2),
|
|
.BR umount (2),
|
|
.BR swapon (2),
|
|
.BR swapoff (2),
|
|
.BR sethostname (2),
|
|
and
|
|
.BR setdomainname (2);
|
|
.IP *
|
|
perform privileged
|
|
.BR syslog (2)
|
|
operations (since Linux 2.6.37,
|
|
.BR CAP_SYSLOG
|
|
should be used to permit such operations);
|
|
.IP *
|
|
perform
|
|
.B VM86_REQUEST_IRQ
|
|
.BR vm86 (2)
|
|
command;
|
|
.IP *
|
|
perform
|
|
.B IPC_SET
|
|
and
|
|
.B IPC_RMID
|
|
operations on arbitrary System V IPC objects;
|
|
.IP *
|
|
perform operations on
|
|
.I trusted
|
|
and
|
|
.I security
|
|
Extended Attributes (see
|
|
.BR attr (5));
|
|
.IP *
|
|
use
|
|
.BR lookup_dcookie (2);
|
|
.IP *
|
|
use
|
|
.BR ioprio_set (2)
|
|
to assign
|
|
.B IOPRIO_CLASS_RT
|
|
and (before Linux 2.6.25)
|
|
.B IOPRIO_CLASS_IDLE
|
|
I/O scheduling classes;
|
|
.IP *
|
|
forge UID when passing socket credentials;
|
|
.IP *
|
|
exceed
|
|
.IR /proc/sys/fs/file-max ,
|
|
the system-wide limit on the number of open files,
|
|
in system calls that open files (e.g.,
|
|
.BR accept (2),
|
|
.BR execve (2),
|
|
.BR open (2),
|
|
.BR pipe (2));
|
|
.IP *
|
|
employ
|
|
.B CLONE_*
|
|
flags that create new namespaces with
|
|
.BR clone (2)
|
|
and
|
|
.BR unshare (2);
|
|
.IP *
|
|
call
|
|
.BR perf_event_open (2);
|
|
.IP *
|
|
access privileged
|
|
.I perf
|
|
event information;
|
|
.IP *
|
|
call
|
|
.BR setns (2);
|
|
.IP *
|
|
call
|
|
.BR fanotify_init (2);
|
|
.IP *
|
|
perform
|
|
.B KEYCTL_CHOWN
|
|
and
|
|
.B KEYCTL_SETPERM
|
|
.BR keyctl (2)
|
|
operations;
|
|
.IP *
|
|
perform
|
|
.BR madvise (2)
|
|
.B MADV_HWPOISON
|
|
operation;
|
|
.IP *
|
|
employ the
|
|
.B TIOCSTI
|
|
.BR ioctl (2)
|
|
to insert characters into the input queue of a terminal other than
|
|
the caller's controlling terminal.
|
|
.IP *
|
|
employ the obsolete
|
|
.BR nfsservctl (2)
|
|
system call;
|
|
.IP *
|
|
employ the obsolete
|
|
.BR bdflush (2)
|
|
system call;
|
|
.IP *
|
|
perform various privileged block-device
|
|
.BR ioctl (2)
|
|
operations;
|
|
.IP *
|
|
perform various privileged file-system
|
|
.BR ioctl (2)
|
|
operations;
|
|
.IP *
|
|
perform administrative operations on many device drivers.
|
|
.RE
|
|
.PD
|
|
.TP
|
|
.B CAP_SYS_BOOT
|
|
Use
|
|
.BR reboot (2)
|
|
and
|
|
.BR kexec_load (2).
|
|
.TP
|
|
.B CAP_SYS_CHROOT
|
|
Use
|
|
.BR chroot (2).
|
|
.TP
|
|
.B CAP_SYS_MODULE
|
|
Load and unload kernel modules
|
|
(see
|
|
.BR init_module (2)
|
|
and
|
|
.BR delete_module (2));
|
|
in kernels before 2.6.25:
|
|
drop capabilities from the system-wide capability bounding set.
|
|
.TP
|
|
.B CAP_SYS_NICE
|
|
.PD 0
|
|
.RS
|
|
.IP * 2
|
|
Raise process nice value
|
|
.RB ( nice (2),
|
|
.BR setpriority (2))
|
|
and change the nice value for arbitrary processes;
|
|
.IP *
|
|
set real-time scheduling policies for calling process,
|
|
and set scheduling policies and priorities for arbitrary processes
|
|
.RB ( sched_setscheduler (2),
|
|
.BR sched_setparam (2));
|
|
.IP *
|
|
set CPU affinity for arbitrary processes
|
|
.RB ( sched_setaffinity (2));
|
|
.IP *
|
|
set I/O scheduling class and priority for arbitrary processes
|
|
.RB ( ioprio_set (2));
|
|
.IP *
|
|
apply
|
|
.BR migrate_pages (2)
|
|
to arbitrary processes and allow processes
|
|
to be migrated to arbitrary nodes;
|
|
.\" FIXME CAP_SYS_NICE also has the following effect for
|
|
.\" migrate_pages(2):
|
|
.\" do_migrate_pages(mm, &old, &new,
|
|
.\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
|
|
.IP *
|
|
apply
|
|
.BR move_pages (2)
|
|
to arbitrary processes;
|
|
.IP *
|
|
use the
|
|
.B MPOL_MF_MOVE_ALL
|
|
flag with
|
|
.BR mbind (2)
|
|
and
|
|
.BR move_pages (2).
|
|
.RE
|
|
.PD
|
|
.TP
|
|
.B CAP_SYS_PACCT
|
|
Use
|
|
.BR acct (2).
|
|
.TP
|
|
.B CAP_SYS_PTRACE
|
|
Trace arbitrary processes using
|
|
.BR ptrace (2);
|
|
apply
|
|
.BR get_robust_list (2)
|
|
to arbitrary processes;
|
|
inspect processes using
|
|
.BR kcmp (2).
|
|
.TP
|
|
.B CAP_SYS_RAWIO
|
|
Perform I/O port operations
|
|
.RB ( iopl (2)
|
|
and
|
|
.BR ioperm (2));
|
|
access
|
|
.IR /proc/kcore ;
|
|
employ the
|
|
.B FIBMAP
|
|
.BR ioctl (2)
|
|
operation.
|
|
.TP
|
|
.B CAP_SYS_RESOURCE
|
|
.PD 0
|
|
.RS
|
|
.IP * 2
|
|
Use reserved space on ext2 file systems;
|
|
.IP *
|
|
make
|
|
.BR ioctl (2)
|
|
calls controlling ext3 journaling;
|
|
.IP *
|
|
override disk quota limits;
|
|
.IP *
|
|
increase resource limits (see
|
|
.BR setrlimit (2));
|
|
.IP *
|
|
override
|
|
.B RLIMIT_NPROC
|
|
resource limit;
|
|
.IP *
|
|
override maximum number of consoles on console allocation;
|
|
.IP *
|
|
override maximum number of keymaps;
|
|
.IP *
|
|
allow more than 64hz interrupts from the real-time clock;
|
|
.IP *
|
|
raise
|
|
.I msg_qbytes
|
|
limit for a System V message queue above the limit in
|
|
.I /proc/sys/kernel/msgmnb
|
|
(see
|
|
.BR msgop (2)
|
|
and
|
|
.BR msgctl (2));
|
|
.IP *
|
|
override the
|
|
.I /proc/sys/fs/pipe-size-max
|
|
limit when setting the capacity of a pipe using the
|
|
.B F_SETPIPE_SZ
|
|
.BR fcntl (2)
|
|
command.
|
|
.IP *
|
|
use
|
|
.BR F_SETPIPE_SZ
|
|
to increase the capacity of a pipe above the limit specified by
|
|
.IR /proc/sys/fs/pipe-max-size ;
|
|
.IP *
|
|
override
|
|
.I /proc/sys/fs/mqueue/queues_max
|
|
limit when creating POSIX message queues (see
|
|
.BR mq_overview (7));
|
|
.IP *
|
|
employ
|
|
.BR prctl (2)
|
|
.B PR_SET_MM
|
|
operation.
|
|
.RE
|
|
.PD
|
|
.TP
|
|
.B CAP_SYS_TIME
|
|
Set system clock
|
|
.RB ( settimeofday (2),
|
|
.BR stime (2),
|
|
.BR adjtimex (2));
|
|
set real-time (hardware) clock.
|
|
.TP
|
|
.B CAP_SYS_TTY_CONFIG
|
|
Use
|
|
.BR vhangup (2);
|
|
employ various privileged
|
|
.BR ioctl (2)
|
|
operations on virtual terminals.
|
|
.TP
|
|
.BR CAP_SYSLOG " (since Linux 2.6.37)"
|
|
.IP * 3
|
|
Perform privileged
|
|
.BR syslog (2)
|
|
operations.
|
|
See
|
|
.BR syslog (2)
|
|
for information on which operations require privilege.
|
|
.IP *
|
|
View kernel addresses exposed via
|
|
.I /proc
|
|
and other interfaces when
|
|
.IR /proc/sys/kernel/kptr_restrict
|
|
has the value 1.
|
|
(See the discussion of the
|
|
.I kptr_restrict
|
|
in
|
|
.BR proc (5).)
|
|
.TP
|
|
.BR CAP_WAKE_ALARM " (since Linux 3.0)"
|
|
Trigger something that will wake up the system (set
|
|
.B CLOCK_REALTIME_ALARM
|
|
and
|
|
.B CLOCK_BOOTTIME_ALARM
|
|
timers).
|
|
.\"
|
|
.SS Past and current implementation
|
|
A full implementation of capabilities requires that:
|
|
.IP 1. 3
|
|
For all privileged operations,
|
|
the kernel must check whether the thread has the required
|
|
capability in its effective set.
|
|
.IP 2.
|
|
The kernel must provide system calls allowing a thread's capability sets to
|
|
be changed and retrieved.
|
|
.IP 3.
|
|
The file system must support attaching capabilities to an executable file,
|
|
so that a process gains those capabilities when the file is executed.
|
|
.PP
|
|
Before kernel 2.6.24, only the first two of these requirements are met;
|
|
since kernel 2.6.24, all three requirements are met.
|
|
.\"
|
|
.SS Thread capability sets
|
|
Each thread has three capability sets containing zero or more
|
|
of the above capabilities:
|
|
.TP
|
|
.IR Permitted :
|
|
This is a limiting superset for the effective
|
|
capabilities that the thread may assume.
|
|
It is also a limiting superset for the capabilities that
|
|
may be added to the inheritable set by a thread that does not have the
|
|
.B CAP_SETPCAP
|
|
capability in its effective set.
|
|
|
|
If a thread drops a capability from its permitted set,
|
|
it can never reacquire that capability (unless it
|
|
.BR execve (2)s
|
|
either a set-user-ID-root program, or
|
|
a program whose associated file capabilities grant that capability).
|
|
.TP
|
|
.IR Inheritable :
|
|
This is a set of capabilities preserved across an
|
|
.BR execve (2).
|
|
It provides a mechanism for a process to assign capabilities
|
|
to the permitted set of the new program during an
|
|
.BR execve (2).
|
|
.TP
|
|
.IR Effective :
|
|
This is the set of capabilities used by the kernel to
|
|
perform permission checks for the thread.
|
|
.PP
|
|
A child created via
|
|
.BR fork (2)
|
|
inherits copies of its parent's capability sets.
|
|
See below for a discussion of the treatment of capabilities during
|
|
.BR execve (2).
|
|
.PP
|
|
Using
|
|
.BR capset (2),
|
|
a thread may manipulate its own capability sets (see below).
|
|
.\"
|
|
.SS File capabilities
|
|
Since kernel 2.6.24, the kernel supports
|
|
associating capability sets with an executable file using
|
|
.BR setcap (8).
|
|
The file capability sets are stored in an extended attribute (see
|
|
.BR setxattr (2))
|
|
named
|
|
.IR "security.capability" .
|
|
Writing to this extended attribute requires the
|
|
.BR CAP_SETFCAP
|
|
capability.
|
|
The file capability sets,
|
|
in conjunction with the capability sets of the thread,
|
|
determine the capabilities of a thread after an
|
|
.BR execve (2).
|
|
|
|
The three file capability sets are:
|
|
.TP
|
|
.IR Permitted " (formerly known as " forced ):
|
|
These capabilities are automatically permitted to the thread,
|
|
regardless of the thread's inheritable capabilities.
|
|
.TP
|
|
.IR Inheritable " (formerly known as " allowed ):
|
|
This set is ANDed with the thread's inheritable set to determine which
|
|
inheritable capabilities are enabled in the permitted set of
|
|
the thread after the
|
|
.BR execve (2).
|
|
.TP
|
|
.IR Effective :
|
|
This is not a set, but rather just a single bit.
|
|
If this bit is set, then during an
|
|
.BR execve (2)
|
|
all of the new permitted capabilities for the thread are
|
|
also raised in the effective set.
|
|
If this bit is not set, then after an
|
|
.BR execve (2),
|
|
none of the new permitted capabilities is in the new effective set.
|
|
|
|
Enabling the file effective capability bit implies
|
|
that any file permitted or inheritable capability that causes a
|
|
thread to acquire the corresponding permitted capability during an
|
|
.BR execve (2)
|
|
(see the transformation rules described below) will also acquire that
|
|
capability in its effective set.
|
|
Therefore, when assigning capabilities to a file
|
|
.RB ( setcap (8),
|
|
.BR cap_set_file (3),
|
|
.BR cap_set_fd (3)),
|
|
if we specify the effective flag as being enabled for any capability,
|
|
then the effective flag must also be specified as enabled
|
|
for all other capabilities for which the corresponding permitted or
|
|
inheritable flags is enabled.
|
|
.\"
|
|
.SS Transformation of capabilities during execve()
|
|
.PP
|
|
During an
|
|
.BR execve (2),
|
|
the kernel calculates the new capabilities of
|
|
the process using the following algorithm:
|
|
.in +4n
|
|
.nf
|
|
|
|
P'(permitted) = (P(inheritable) & F(inheritable)) |
|
|
(F(permitted) & cap_bset)
|
|
|
|
P'(effective) = F(effective) ? P'(permitted) : 0
|
|
|
|
P'(inheritable) = P(inheritable) [i.e., unchanged]
|
|
|
|
.fi
|
|
.in
|
|
where:
|
|
.RS 4
|
|
.IP P 10
|
|
denotes the value of a thread capability set before the
|
|
.BR execve (2)
|
|
.IP P'
|
|
denotes the value of a capability set after the
|
|
.BR execve (2)
|
|
.IP F
|
|
denotes a file capability set
|
|
.IP cap_bset
|
|
is the value of the capability bounding set (described below).
|
|
.RE
|
|
.\"
|
|
.SS Capabilities and execution of programs by root
|
|
In order to provide an all-powerful
|
|
.I root
|
|
using capability sets, during an
|
|
.BR execve (2):
|
|
.IP 1. 3
|
|
If a set-user-ID-root program is being executed,
|
|
or the real user ID of the process is 0 (root)
|
|
then the file inheritable and permitted sets are defined to be all ones
|
|
(i.e., all capabilities enabled).
|
|
.IP 2.
|
|
If a set-user-ID-root program is being executed,
|
|
then the file effective bit is defined to be one (enabled).
|
|
.PP
|
|
The upshot of the above rules,
|
|
combined with the capabilities transformations described above,
|
|
is that when a process
|
|
.BR execve (2)s
|
|
a set-user-ID-root program, or when a process with an effective UID of 0
|
|
.BR execve (2)s
|
|
a program,
|
|
it gains all capabilities in its permitted and effective capability sets,
|
|
except those masked out by the capability bounding set.
|
|
.\" If a process with real UID 0, and nonzero effective UID does an
|
|
.\" exec(), then it gets all capabilities in its
|
|
.\" permitted set, and no effective capabilities
|
|
This provides semantics that are the same as those provided by
|
|
traditional UNIX systems.
|
|
.SS Capability bounding set
|
|
The capability bounding set is a security mechanism that can be used
|
|
to limit the capabilities that can be gained during an
|
|
.BR execve (2).
|
|
The bounding set is used in the following ways:
|
|
.IP * 2
|
|
During an
|
|
.BR execve (2),
|
|
the capability bounding set is ANDed with the file permitted
|
|
capability set, and the result of this operation is assigned to the
|
|
thread's permitted capability set.
|
|
The capability bounding set thus places a limit on the permitted
|
|
capabilities that may be granted by an executable file.
|
|
.IP *
|
|
(Since Linux 2.6.25)
|
|
The capability bounding set acts as a limiting superset for
|
|
the capabilities that a thread can add to its inheritable set using
|
|
.BR capset (2).
|
|
This means that if a capability is not in the bounding set,
|
|
then a thread can't add this capability to its
|
|
inheritable set, even if it was in its permitted capabilities,
|
|
and thereby cannot have this capability preserved in its
|
|
permitted set when it
|
|
.BR execve (2)s
|
|
a file that has the capability in its inheritable set.
|
|
.PP
|
|
Note that the bounding set masks the file permitted capabilities,
|
|
but not the inherited capabilities.
|
|
If a thread maintains a capability in its inherited set
|
|
that is not in its bounding set,
|
|
then it can still gain that capability in its permitted set
|
|
by executing a file that has the capability in its inherited set.
|
|
.PP
|
|
Depending on the kernel version, the capability bounding set is either
|
|
a system-wide attribute, or a per-process attribute.
|
|
.PP
|
|
.B "Capability bounding set prior to Linux 2.6.25"
|
|
.PP
|
|
In kernels before 2.6.25, the capability bounding set is a system-wide
|
|
attribute that affects all threads on the system.
|
|
The bounding set is accessible via the file
|
|
.IR /proc/sys/kernel/cap-bound .
|
|
(Confusingly, this bit mask parameter is expressed as a
|
|
signed decimal number in
|
|
.IR /proc/sys/kernel/cap-bound .)
|
|
|
|
Only the
|
|
.B init
|
|
process may set capabilities in the capability bounding set;
|
|
other than that, the superuser (more precisely: programs with the
|
|
.B CAP_SYS_MODULE
|
|
capability) may only clear capabilities from this set.
|
|
|
|
On a standard system the capability bounding set always masks out the
|
|
.B CAP_SETPCAP
|
|
capability.
|
|
To remove this restriction (dangerous!), modify the definition of
|
|
.B CAP_INIT_EFF_SET
|
|
in
|
|
.I include/linux/capability.h
|
|
and rebuild the kernel.
|
|
|
|
The system-wide capability bounding set feature was added
|
|
to Linux starting with kernel version 2.2.11.
|
|
.\"
|
|
.PP
|
|
.B "Capability bounding set from Linux 2.6.25 onward"
|
|
.PP
|
|
From Linux 2.6.25, the
|
|
.I "capability bounding set"
|
|
is a per-thread attribute.
|
|
(There is no longer a system-wide capability bounding set.)
|
|
|
|
The bounding set is inherited at
|
|
.BR fork (2)
|
|
from the thread's parent, and is preserved across an
|
|
.BR execve (2).
|
|
|
|
A thread may remove capabilities from its capability bounding set using the
|
|
.BR prctl (2)
|
|
.B PR_CAPBSET_DROP
|
|
operation, provided it has the
|
|
.B CAP_SETPCAP
|
|
capability.
|
|
Once a capability has been dropped from the bounding set,
|
|
it cannot be restored to that set.
|
|
A thread can determine if a capability is in its bounding set using the
|
|
.BR prctl (2)
|
|
.B PR_CAPBSET_READ
|
|
operation.
|
|
|
|
Removing capabilities from the bounding set is only supported if file
|
|
capabilities are compiled into the kernel.
|
|
In kernels before Linux 2.6.33,
|
|
file capabilities were an optional feature configurable via the
|
|
CONFIG_SECURITY_FILE_CAPABILITIES
|
|
option.
|
|
Since Linux 2.6.33, the configuration option has been removed
|
|
and file capabilities are always part of the kernel.
|
|
When file capabilities are compiled into the kernel, the
|
|
.B init
|
|
process (the ancestor of all processes) begins with a full bounding set.
|
|
If file capabilities are not compiled into the kernel, then
|
|
.B init
|
|
begins with a full bounding set minus
|
|
.BR CAP_SETPCAP ,
|
|
because this capability has a different meaning when there are
|
|
no file capabilities.
|
|
|
|
Removing a capability from the bounding set does not remove it
|
|
from the thread's inherited set.
|
|
However it does prevent the capability from being added
|
|
back into the thread's inherited set in the future.
|
|
.\"
|
|
.\"
|
|
.SS Effect of user ID changes on capabilities
|
|
To preserve the traditional semantics for transitions between
|
|
0 and nonzero user IDs,
|
|
the kernel makes the following changes to a thread's capability
|
|
sets on changes to the thread's real, effective, saved set,
|
|
and file system user IDs (using
|
|
.BR setuid (2),
|
|
.BR setresuid (2),
|
|
or similar):
|
|
.IP 1. 3
|
|
If one or more of the real, effective or saved set user IDs
|
|
was previously 0, and as a result of the UID changes all of these IDs
|
|
have a nonzero value,
|
|
then all capabilities are cleared from the permitted and effective
|
|
capability sets.
|
|
.IP 2.
|
|
If the effective user ID is changed from 0 to nonzero,
|
|
then all capabilities are cleared from the effective set.
|
|
.IP 3.
|
|
If the effective user ID is changed from nonzero to 0,
|
|
then the permitted set is copied to the effective set.
|
|
.IP 4.
|
|
If the file system user ID is changed from 0 to nonzero (see
|
|
.BR setfsuid (2))
|
|
then the following capabilities are cleared from the effective set:
|
|
.BR CAP_CHOWN ,
|
|
.BR CAP_DAC_OVERRIDE ,
|
|
.BR CAP_DAC_READ_SEARCH ,
|
|
.BR CAP_FOWNER ,
|
|
.BR CAP_FSETID ,
|
|
.B CAP_LINUX_IMMUTABLE
|
|
(since Linux 2.2.30),
|
|
.BR CAP_MAC_OVERRIDE ,
|
|
and
|
|
.B CAP_MKNOD
|
|
(since Linux 2.2.30).
|
|
If the file system UID is changed from nonzero to 0,
|
|
then any of these capabilities that are enabled in the permitted set
|
|
are enabled in the effective set.
|
|
.PP
|
|
If a thread that has a 0 value for one or more of its user IDs wants
|
|
to prevent its permitted capability set being cleared when it resets
|
|
all of its user IDs to nonzero values, it can do so using the
|
|
.BR prctl (2)
|
|
.B PR_SET_KEEPCAPS
|
|
operation.
|
|
.\"
|
|
.SS Programmatically adjusting capability sets
|
|
A thread can retrieve and change its capability sets using the
|
|
.BR capget (2)
|
|
and
|
|
.BR capset (2)
|
|
system calls.
|
|
However, the use of
|
|
.BR cap_get_proc (3)
|
|
and
|
|
.BR cap_set_proc (3),
|
|
both provided in the
|
|
.I libcap
|
|
package,
|
|
is preferred for this purpose.
|
|
The following rules govern changes to the thread capability sets:
|
|
.IP 1. 3
|
|
If the caller does not have the
|
|
.B CAP_SETPCAP
|
|
capability,
|
|
the new inheritable set must be a subset of the combination
|
|
of the existing inheritable and permitted sets.
|
|
.IP 2.
|
|
(Since kernel 2.6.25)
|
|
The new inheritable set must be a subset of the combination of the
|
|
existing inheritable set and the capability bounding set.
|
|
.IP 3.
|
|
The new permitted set must be a subset of the existing permitted set
|
|
(i.e., it is not possible to acquire permitted capabilities
|
|
that the thread does not currently have).
|
|
.IP 4.
|
|
The new effective set must be a subset of the new permitted set.
|
|
.SS The securebits flags: establishing a capabilities-only environment
|
|
.\" For some background:
|
|
.\" see http://lwn.net/Articles/280279/ and
|
|
.\" http://article.gmane.org/gmane.linux.kernel.lsm/5476/
|
|
Starting with kernel 2.6.26,
|
|
and with a kernel in which file capabilities are enabled,
|
|
Linux implements a set of per-thread
|
|
.I securebits
|
|
flags that can be used to disable special handling of capabilities for UID 0
|
|
.RI ( root ).
|
|
These flags are as follows:
|
|
.TP
|
|
.B SECBIT_KEEP_CAPS
|
|
Setting this flag allows a thread that has one or more 0 UIDs to retain
|
|
its capabilities when it switches all of its UIDs to a nonzero value.
|
|
If this flag is not set,
|
|
then such a UID switch causes the thread to lose all capabilities.
|
|
This flag is always cleared on an
|
|
.BR execve (2).
|
|
(This flag provides the same functionality as the older
|
|
.BR prctl (2)
|
|
.B PR_SET_KEEPCAPS
|
|
operation.)
|
|
.TP
|
|
.B SECBIT_NO_SETUID_FIXUP
|
|
Setting this flag stops the kernel from adjusting capability sets when
|
|
the threads's effective and file system UIDs are switched between
|
|
zero and nonzero values.
|
|
(See the subsection
|
|
.IR "Effect of User ID Changes on Capabilities" .)
|
|
.TP
|
|
.B SECBIT_NOROOT
|
|
If this bit is set, then the kernel does not grant capabilities
|
|
when a set-user-ID-root program is executed, or when a process with
|
|
an effective or real UID of 0 calls
|
|
.BR execve (2).
|
|
(See the subsection
|
|
.IR "Capabilities and execution of programs by root" .)
|
|
.PP
|
|
Each of the above "base" flags has a companion "locked" flag.
|
|
Setting any of the "locked" flags is irreversible,
|
|
and has the effect of preventing further changes to the
|
|
corresponding "base" flag.
|
|
The locked flags are:
|
|
.BR SECBIT_KEEP_CAPS_LOCKED ,
|
|
.BR SECBIT_NO_SETUID_FIXUP_LOCKED ,
|
|
and
|
|
.BR SECBIT_NOROOT_LOCKED .
|
|
.PP
|
|
The
|
|
.I securebits
|
|
flags can be modified and retrieved using the
|
|
.BR prctl (2)
|
|
.B PR_SET_SECUREBITS
|
|
and
|
|
.B PR_GET_SECUREBITS
|
|
operations.
|
|
The
|
|
.B CAP_SETPCAP
|
|
capability is required to modify the flags.
|
|
|
|
The
|
|
.I securebits
|
|
flags are inherited by child processes.
|
|
During an
|
|
.BR execve (2),
|
|
all of the flags are preserved, except
|
|
.B SECBIT_KEEP_CAPS
|
|
which is always cleared.
|
|
|
|
An application can use the following call to lock itself,
|
|
and all of its descendants,
|
|
into an environment where the only way of gaining capabilities
|
|
is by executing a program with associated file capabilities:
|
|
.in +4n
|
|
.nf
|
|
|
|
prctl(PR_SET_SECUREBITS,
|
|
SECBIT_KEEP_CAPS_LOCKED |
|
|
SECBIT_NO_SETUID_FIXUP |
|
|
SECBIT_NO_SETUID_FIXUP_LOCKED |
|
|
SECBIT_NOROOT |
|
|
SECBIT_NOROOT_LOCKED);
|
|
.fi
|
|
.in
|
|
.SH CONFORMING TO
|
|
.PP
|
|
No standards govern capabilities, but the Linux capability implementation
|
|
is based on the withdrawn POSIX.1e draft standard; see
|
|
.UR http://wt.tuxomania.net\:/publications\:/posix.1e/
|
|
.UE .
|
|
.SH NOTES
|
|
Since kernel 2.5.27, capabilities are an optional kernel component,
|
|
and can be enabled/disabled via the CONFIG_SECURITY_CAPABILITIES
|
|
kernel configuration option.
|
|
|
|
The
|
|
.I /proc/PID/task/TID/status
|
|
file can be used to view the capability sets of a thread.
|
|
The
|
|
.I /proc/PID/status
|
|
file shows the capability sets of a process's main thread.
|
|
Before Linux 3.8, nonexistent capabilities were shown as being
|
|
enabled (1) in these sets.
|
|
Since Linux 3.8,
|
|
.\" 7b9a7ec565505699f503b4fcf61500dceb36e744
|
|
all non-existent capabilities (above
|
|
.B CAP_LAST_CAP)
|
|
are shown as disabled (0).
|
|
|
|
The
|
|
.I libcap
|
|
package provides a suite of routines for setting and
|
|
getting capabilities that is more comfortable and less likely
|
|
to change than the interface provided by
|
|
.BR capset (2)
|
|
and
|
|
.BR capget (2).
|
|
This package also provides the
|
|
.BR setcap (8)
|
|
and
|
|
.BR getcap (8)
|
|
programs.
|
|
It can be found at
|
|
.br
|
|
.UR http://www.kernel.org\:/pub\:/linux\:/libs\:/security\:/linux-privs
|
|
.UE .
|
|
|
|
Before kernel 2.6.24, and since kernel 2.6.24 if
|
|
file capabilities are not enabled, a thread with the
|
|
.B CAP_SETPCAP
|
|
capability can manipulate the capabilities of threads other than itself.
|
|
However, this is only theoretically possible,
|
|
since no thread ever has
|
|
.BR CAP_SETPCAP
|
|
in either of these cases:
|
|
.IP * 2
|
|
In the pre-2.6.25 implementation the system-wide capability bounding set,
|
|
.IR /proc/sys/kernel/cap-bound ,
|
|
always masks out this capability, and this can not be changed
|
|
without modifying the kernel source and rebuilding.
|
|
.IP *
|
|
If file capabilities are disabled in the current implementation, then
|
|
.B init
|
|
starts out with this capability removed from its per-process bounding
|
|
set, and that bounding set is inherited by all other processes
|
|
created on the system.
|
|
.SH SEE ALSO
|
|
.BR capget (2),
|
|
.BR prctl (2),
|
|
.BR setfsuid (2),
|
|
.BR cap_clear (3),
|
|
.BR cap_copy_ext (3),
|
|
.BR cap_from_text (3),
|
|
.BR cap_get_file (3),
|
|
.BR cap_get_proc (3),
|
|
.BR cap_init (3),
|
|
.BR capgetp (3),
|
|
.BR capsetp (3),
|
|
.BR libcap (3),
|
|
.BR credentials (7),
|
|
.BR pthreads (7),
|
|
.BR getcap (8),
|
|
.BR setcap (8)
|
|
.PP
|
|
.I include/linux/capability.h
|
|
in the Linux kernel source tree
|