bpf.2: Minor tweaks to Daniel Borkmann's patch

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2015-08-05 22:07:09 +02:00
parent 9a818dddcf
commit cd579c3f1a
1 changed files with 62 additions and 48 deletions

View File

@ -53,7 +53,7 @@ and access shared data structures such as eBPF maps.
.SS Extended BPF Design/Architecture .SS Extended BPF Design/Architecture
eBPF maps are a generic data structure for storage of different data types. eBPF maps are a generic data structure for storage of different data types.
Data types are generally treated as binary blobs, so a user just specifies Data types are generally treated as binary blobs, so a user just specifies
the size of the key and the size of the value at map creation time. the size of the key and the size of the value at map-creation time.
In other words, a key/value for a given map can have an arbitrary structure. In other words, a key/value for a given map can have an arbitrary structure.
A user process can create multiple maps (with key/value-pairs being A user process can create multiple maps (with key/value-pairs being
@ -62,35 +62,38 @@ Different eBPF programs can access the same maps in parallel.
It's up to the user process and eBPF program to decide what they store It's up to the user process and eBPF program to decide what they store
inside maps. inside maps.
There's one special map type which is a program array. There's one special map type, called a program array.
This map stores file descriptors to other eBPF programs. This type of map stores file descriptors referring to other eBPF programs.
Thus, when a lookup in that map is performed, the program flow is When a lookup in the map is performed, the program flow is
redirected in-place to the beginning of the new eBPF program without redirected in-place to the beginning of another eBPF program and does not
returning back. return back to the calling program.
The level of nesting has a fixed limit of 32, so that infinite loops cannot The level of nesting has a fixed limit of 32, so that infinite loops cannot
be crafted. be crafted.
During runtime, the program file descriptors stored in that map can be modified, At runtime, the program file descriptors stored in the map can be modified,
so program functionality can be altered based on specific requirements. so program functionality can be altered based on specific requirements.
All programs stored in such a map have been loaded into the kernel via All programs referred to in a program-array map must
.BR bpf () have been previously loaded into the kernel via
as well. .BR bpf ().
In case a lookup has failed, the current program continues its execution. If a map lookup fails, the current program continues its execution.
See BPF_MAP_TYPE_PROG_ARRAY below for further details. See
.B BPF_MAP_TYPE_PROG_ARRAY
below for further details.
.P .P
Generally, eBPF programs are loaded by the user process and automatically Generally, eBPF programs are loaded by the user process and automatically
unloaded when the process exits. In some cases, for example, unloaded when the process exits.
In some cases, for example,
.BR tc-bpf (8), .BR tc-bpf (8),
the program will continue to stay alive inside the kernel even after the the program will continue to stay alive inside the kernel even after the
the process that loaded the program exits. the process that loaded the program exits.
In that case, the tc subsystem holds a reference to the program after the In that case,
file descriptor has been dropped by the user. the tc subsystem holds a reference to the eBPF program after the
file descriptor has been closed by the user-space program.
Thus, whether a specific program continues to live inside the kernel Thus, whether a specific program continues to live inside the kernel
depends on how it is further attached to a given kernel subsystem depends on how it is further attached to a given kernel subsystem
after it was loaded via after it was loaded via
.BR bpf () .BR bpf ().
\.
Each program is a set of instructions that is safe to run until Each eBPF program is a set of instructions that is safe to run until
its completion. its completion.
An in-kernel verifier statically determines that the eBPF program An in-kernel verifier statically determines that the eBPF program
terminates and is safe to execute. terminates and is safe to execute.
@ -114,15 +117,15 @@ eBPF programs can access the same map:
.in +4n .in +4n
.nf .nf
tracing tracing tracing packet packet packet tracing tracing tracing packet packet packet
event A event B event C on eth0 on eth1 on eth2 event A event B event C on eth0 on eth1 on eth2
| | | | | ^ | | | | | ^
| | | | v | | | | | v |
--> tracing <-- tracing socket tc ingress tc egress --> tracing <-- tracing socket tc ingress tc egress
prog_1 prog_2 prog_3 classifier action prog_1 prog_2 prog_3 classifier action
| | | | prog_4 prog_5 | | | | prog_4 prog_5
|--- -----| |-------| map_3 | | |--- -----| |------| map_3 | |
map_1 map_2 --| map_4 |-- map_1 map_2 --| map_4 |--
.fi .fi
.in .in
.\" .\"
@ -616,15 +619,16 @@ since elements cannot be deleted.
replaces elements in a replaces elements in a
.B nonatomic .B nonatomic
fashion; fashion;
for atomic updates, a hash-table map should be used instead. There is for atomic updates, a hash-table map should be used instead.
however one special case that can also be used with arrays: the atomic There is however one special case that can also be used with arrays:
built-in the atomic built-in
.BR __sync_fetch_and_add() .BR __sync_fetch_and_add()
can be used on 32 and 64 bit atomic counters. For example, it can be can be used on 32 and 64 bit atomic counters.
For example, it can be
applied on the whole value itself if it represents a single counter, applied on the whole value itself if it represents a single counter,
or in case of a structure containing multiple counters, it could be or in case of a structure containing multiple counters, it could be
used on individual ones. This is quite often useful for aggregation used on individual counters.
and accounting of events. This is quite often useful for aggregation and accounting of events.
.RE .RE
.IP .IP
Among the uses for array maps are the following: Among the uses for array maps are the following:
@ -641,11 +645,15 @@ sizes.
.RE .RE
.TP .TP
.BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)" .BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)"
A program array map is a special kind of array map, whose map values only A program array map is a special kind of array map whose map values
contain valid file descriptors to other eBPF programs. Thus both the contain only file descriptors referring to other eBPF programs.
key_size and value_size must be exactly four bytes. Thus, both the
.I key_size
and
.I value_size
must be exactly four bytes.
This map is used in conjunction with the This map is used in conjunction with the
.BR bpf_tail_call() .BR bpf_tail_call ()
helper. helper.
This means that an eBPF program with a program array map attached to it This means that an eBPF program with a program array map attached to it
@ -658,23 +666,29 @@ void bpf_tail_call(void *context, void *prog_map, unsigned int index);
.in .in
and therefore replace its own program flow with the one from the program and therefore replace its own program flow with the one from the program
at the given program array slot if present. This can be regarded as kind at the given program array slot, if present.
of a jump table to a different eBPF program. The invoked program will then This can be regarded as kind of a jump table to a different eBPF program.
reuse the same stack. When a jump into the new program has been performed, The invoked program will then reuse the same stack.
it won't return to the old one anymore. When a jump into the new program has been performed,
it won't return to the old program anymore.
If no eBPF program is found at the given index of the program array, If no eBPF program is found at the given index of the program array,
.\" FIXME The array does not contain eBPF programs, but rather file
.\" descriptors. So, what does "no eBPF program is found" here
.\" really mean?
execution continues with the current eBPF program. execution continues with the current eBPF program.
This can be used as a fall-through for default cases. This can be used as a fall-through for default cases.
A program array map is useful, for example, in tracing or networking, to A program array map is useful, for example, in tracing or networking, to
handle individual system calls resp. protocols in its own sub-programs and handle individual system calls or protocols in their own subprograms and
use their identifiers as an individual map index. This approach may result use their identifiers as an individual map index.
in performance benefits, and also makes it possible to overcome the maximum This approach may result in performance benefits,
instruction limit of a single program. and also makes it possible to overcome the maximum
In dynamic environments, a user space daemon may atomically replace individual instruction limit of a single eBPF program.
sub-programs at run-time with newer versions to alter overall program In dynamic environments,
behavior, for instance, when global policies might change. a user-space daemon might atomically replace individual subprograms
at run-time with newer versions to alter overall program behavior,
for instance, if global policies change.
.\" .\"
.SS eBPF programs .SS eBPF programs
The The