mirror of https://github.com/mkerrisk/man-pages
bpf.2: Minor tweaks to Daniel Borkmann's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
9a818dddcf
commit
cd579c3f1a
110
man2/bpf.2
110
man2/bpf.2
|
@ -53,7 +53,7 @@ and access shared data structures such as eBPF maps.
|
||||||
.SS Extended BPF Design/Architecture
|
.SS Extended BPF Design/Architecture
|
||||||
eBPF maps are a generic data structure for storage of different data types.
|
eBPF maps are a generic data structure for storage of different data types.
|
||||||
Data types are generally treated as binary blobs, so a user just specifies
|
Data types are generally treated as binary blobs, so a user just specifies
|
||||||
the size of the key and the size of the value at map creation time.
|
the size of the key and the size of the value at map-creation time.
|
||||||
In other words, a key/value for a given map can have an arbitrary structure.
|
In other words, a key/value for a given map can have an arbitrary structure.
|
||||||
|
|
||||||
A user process can create multiple maps (with key/value-pairs being
|
A user process can create multiple maps (with key/value-pairs being
|
||||||
|
@ -62,35 +62,38 @@ Different eBPF programs can access the same maps in parallel.
|
||||||
It's up to the user process and eBPF program to decide what they store
|
It's up to the user process and eBPF program to decide what they store
|
||||||
inside maps.
|
inside maps.
|
||||||
|
|
||||||
There's one special map type which is a program array.
|
There's one special map type, called a program array.
|
||||||
This map stores file descriptors to other eBPF programs.
|
This type of map stores file descriptors referring to other eBPF programs.
|
||||||
Thus, when a lookup in that map is performed, the program flow is
|
When a lookup in the map is performed, the program flow is
|
||||||
redirected in-place to the beginning of the new eBPF program without
|
redirected in-place to the beginning of another eBPF program and does not
|
||||||
returning back.
|
return back to the calling program.
|
||||||
The level of nesting has a fixed limit of 32, so that infinite loops cannot
|
The level of nesting has a fixed limit of 32, so that infinite loops cannot
|
||||||
be crafted.
|
be crafted.
|
||||||
During runtime, the program file descriptors stored in that map can be modified,
|
At runtime, the program file descriptors stored in the map can be modified,
|
||||||
so program functionality can be altered based on specific requirements.
|
so program functionality can be altered based on specific requirements.
|
||||||
All programs stored in such a map have been loaded into the kernel via
|
All programs referred to in a program-array map must
|
||||||
.BR bpf ()
|
have been previously loaded into the kernel via
|
||||||
as well.
|
.BR bpf ().
|
||||||
In case a lookup has failed, the current program continues its execution.
|
If a map lookup fails, the current program continues its execution.
|
||||||
See BPF_MAP_TYPE_PROG_ARRAY below for further details.
|
See
|
||||||
|
.B BPF_MAP_TYPE_PROG_ARRAY
|
||||||
|
below for further details.
|
||||||
.P
|
.P
|
||||||
Generally, eBPF programs are loaded by the user process and automatically
|
Generally, eBPF programs are loaded by the user process and automatically
|
||||||
unloaded when the process exits. In some cases, for example,
|
unloaded when the process exits.
|
||||||
|
In some cases, for example,
|
||||||
.BR tc-bpf (8),
|
.BR tc-bpf (8),
|
||||||
the program will continue to stay alive inside the kernel even after the
|
the program will continue to stay alive inside the kernel even after the
|
||||||
the process that loaded the program exits.
|
the process that loaded the program exits.
|
||||||
In that case, the tc subsystem holds a reference to the program after the
|
In that case,
|
||||||
file descriptor has been dropped by the user.
|
the tc subsystem holds a reference to the eBPF program after the
|
||||||
|
file descriptor has been closed by the user-space program.
|
||||||
Thus, whether a specific program continues to live inside the kernel
|
Thus, whether a specific program continues to live inside the kernel
|
||||||
depends on how it is further attached to a given kernel subsystem
|
depends on how it is further attached to a given kernel subsystem
|
||||||
after it was loaded via
|
after it was loaded via
|
||||||
.BR bpf ()
|
.BR bpf ().
|
||||||
\.
|
|
||||||
|
|
||||||
Each program is a set of instructions that is safe to run until
|
Each eBPF program is a set of instructions that is safe to run until
|
||||||
its completion.
|
its completion.
|
||||||
An in-kernel verifier statically determines that the eBPF program
|
An in-kernel verifier statically determines that the eBPF program
|
||||||
terminates and is safe to execute.
|
terminates and is safe to execute.
|
||||||
|
@ -114,15 +117,15 @@ eBPF programs can access the same map:
|
||||||
|
|
||||||
.in +4n
|
.in +4n
|
||||||
.nf
|
.nf
|
||||||
tracing tracing tracing packet packet packet
|
tracing tracing tracing packet packet packet
|
||||||
event A event B event C on eth0 on eth1 on eth2
|
event A event B event C on eth0 on eth1 on eth2
|
||||||
| | | | | ^
|
| | | | | ^
|
||||||
| | | | v |
|
| | | | v |
|
||||||
--> tracing <-- tracing socket tc ingress tc egress
|
--> tracing <-- tracing socket tc ingress tc egress
|
||||||
prog_1 prog_2 prog_3 classifier action
|
prog_1 prog_2 prog_3 classifier action
|
||||||
| | | | prog_4 prog_5
|
| | | | prog_4 prog_5
|
||||||
|--- -----| |-------| map_3 | |
|
|--- -----| |------| map_3 | |
|
||||||
map_1 map_2 --| map_4 |--
|
map_1 map_2 --| map_4 |--
|
||||||
.fi
|
.fi
|
||||||
.in
|
.in
|
||||||
.\"
|
.\"
|
||||||
|
@ -616,15 +619,16 @@ since elements cannot be deleted.
|
||||||
replaces elements in a
|
replaces elements in a
|
||||||
.B nonatomic
|
.B nonatomic
|
||||||
fashion;
|
fashion;
|
||||||
for atomic updates, a hash-table map should be used instead. There is
|
for atomic updates, a hash-table map should be used instead.
|
||||||
however one special case that can also be used with arrays: the atomic
|
There is however one special case that can also be used with arrays:
|
||||||
built-in
|
the atomic built-in
|
||||||
.BR __sync_fetch_and_add()
|
.BR __sync_fetch_and_add()
|
||||||
can be used on 32 and 64 bit atomic counters. For example, it can be
|
can be used on 32 and 64 bit atomic counters.
|
||||||
|
For example, it can be
|
||||||
applied on the whole value itself if it represents a single counter,
|
applied on the whole value itself if it represents a single counter,
|
||||||
or in case of a structure containing multiple counters, it could be
|
or in case of a structure containing multiple counters, it could be
|
||||||
used on individual ones. This is quite often useful for aggregation
|
used on individual counters.
|
||||||
and accounting of events.
|
This is quite often useful for aggregation and accounting of events.
|
||||||
.RE
|
.RE
|
||||||
.IP
|
.IP
|
||||||
Among the uses for array maps are the following:
|
Among the uses for array maps are the following:
|
||||||
|
@ -641,11 +645,15 @@ sizes.
|
||||||
.RE
|
.RE
|
||||||
.TP
|
.TP
|
||||||
.BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)"
|
.BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)"
|
||||||
A program array map is a special kind of array map, whose map values only
|
A program array map is a special kind of array map whose map values
|
||||||
contain valid file descriptors to other eBPF programs. Thus both the
|
contain only file descriptors referring to other eBPF programs.
|
||||||
key_size and value_size must be exactly four bytes.
|
Thus, both the
|
||||||
|
.I key_size
|
||||||
|
and
|
||||||
|
.I value_size
|
||||||
|
must be exactly four bytes.
|
||||||
This map is used in conjunction with the
|
This map is used in conjunction with the
|
||||||
.BR bpf_tail_call()
|
.BR bpf_tail_call ()
|
||||||
helper.
|
helper.
|
||||||
|
|
||||||
This means that an eBPF program with a program array map attached to it
|
This means that an eBPF program with a program array map attached to it
|
||||||
|
@ -658,23 +666,29 @@ void bpf_tail_call(void *context, void *prog_map, unsigned int index);
|
||||||
.in
|
.in
|
||||||
|
|
||||||
and therefore replace its own program flow with the one from the program
|
and therefore replace its own program flow with the one from the program
|
||||||
at the given program array slot if present. This can be regarded as kind
|
at the given program array slot, if present.
|
||||||
of a jump table to a different eBPF program. The invoked program will then
|
This can be regarded as kind of a jump table to a different eBPF program.
|
||||||
reuse the same stack. When a jump into the new program has been performed,
|
The invoked program will then reuse the same stack.
|
||||||
it won't return to the old one anymore.
|
When a jump into the new program has been performed,
|
||||||
|
it won't return to the old program anymore.
|
||||||
|
|
||||||
If no eBPF program is found at the given index of the program array,
|
If no eBPF program is found at the given index of the program array,
|
||||||
|
.\" FIXME The array does not contain eBPF programs, but rather file
|
||||||
|
.\" descriptors. So, what does "no eBPF program is found" here
|
||||||
|
.\" really mean?
|
||||||
execution continues with the current eBPF program.
|
execution continues with the current eBPF program.
|
||||||
This can be used as a fall-through for default cases.
|
This can be used as a fall-through for default cases.
|
||||||
|
|
||||||
A program array map is useful, for example, in tracing or networking, to
|
A program array map is useful, for example, in tracing or networking, to
|
||||||
handle individual system calls resp. protocols in its own sub-programs and
|
handle individual system calls or protocols in their own subprograms and
|
||||||
use their identifiers as an individual map index. This approach may result
|
use their identifiers as an individual map index.
|
||||||
in performance benefits, and also makes it possible to overcome the maximum
|
This approach may result in performance benefits,
|
||||||
instruction limit of a single program.
|
and also makes it possible to overcome the maximum
|
||||||
In dynamic environments, a user space daemon may atomically replace individual
|
instruction limit of a single eBPF program.
|
||||||
sub-programs at run-time with newer versions to alter overall program
|
In dynamic environments,
|
||||||
behavior, for instance, when global policies might change.
|
a user-space daemon might atomically replace individual subprograms
|
||||||
|
at run-time with newer versions to alter overall program behavior,
|
||||||
|
for instance, if global policies change.
|
||||||
.\"
|
.\"
|
||||||
.SS eBPF programs
|
.SS eBPF programs
|
||||||
The
|
The
|
||||||
|
|
Loading…
Reference in New Issue