mirror of https://github.com/mkerrisk/man-pages
2357 lines
86 KiB
Groff
2357 lines
86 KiB
Groff
.\" Man page generated from reStructuredText.
|
|
.\" Copyright (C) All BPF authors and contributors from 2014 to present.
|
|
.\" See git log include/uapi/linux/bpf.h in kernel tree for details.
|
|
.\"
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
.\" preserved on all copies.
|
|
.\"
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
.\" permission notice identical to this one.
|
|
.\"
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
.\" have taken the same level of care in the production of this manual,
|
|
.\" which is licensed free of charge, as they might when working
|
|
.\" professionally.
|
|
.\"
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\" Please do not edit this file. It was generated from the documentation
|
|
.\" located in file include/uapi/linux/bpf.h of the Linux kernel sources
|
|
.\" (helpers description), and from scripts/bpf_helpers_doc.py in the same
|
|
.\" repository (header and footer).
|
|
.
|
|
.TH BPF-HELPERS 7 "" "" ""
|
|
.SH NAME
|
|
BPF-HELPERS \- list of eBPF helper functions
|
|
.
|
|
.nr rst2man-indent-level 0
|
|
.
|
|
.de1 rstReportMargin
|
|
\\$1 \\n[an-margin]
|
|
level \\n[rst2man-indent-level]
|
|
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
-
|
|
\\n[rst2man-indent0]
|
|
\\n[rst2man-indent1]
|
|
\\n[rst2man-indent2]
|
|
..
|
|
.de1 INDENT
|
|
.\" .rstReportMargin pre:
|
|
. RS \\$1
|
|
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
|
|
. nr rst2man-indent-level +1
|
|
.\" .rstReportMargin post:
|
|
..
|
|
.de UNINDENT
|
|
. RE
|
|
.\" indent \\n[an-margin]
|
|
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
.nr rst2man-indent-level -1
|
|
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
|
|
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
|
|
..
|
|
.SH DESCRIPTION
|
|
.sp
|
|
The extended Berkeley Packet Filter (eBPF) subsystem consists in programs
|
|
written in a pseudo\-assembly language, then attached to one of the several
|
|
kernel hooks and run in reaction of specific events. This framework differs
|
|
from the older, "classic" BPF (or "cBPF") in several aspects, one of them being
|
|
the ability to call special functions (or "helpers") from within a program.
|
|
These functions are restricted to a white\-list of helpers defined in the
|
|
kernel.
|
|
.sp
|
|
These helpers are used by eBPF programs to interact with the system, or with
|
|
the context in which they work. For instance, they can be used to print
|
|
debugging messages, to get the time since the system was booted, to interact
|
|
with eBPF maps, or to manipulate network packets. Since there are several eBPF
|
|
program types, and that they do not run in the same context, each program type
|
|
can only call a subset of those helpers.
|
|
.sp
|
|
Due to eBPF conventions, a helper can not have more than five arguments.
|
|
.sp
|
|
Internally, eBPF programs call directly into the compiled helper functions
|
|
without requiring any foreign\-function interface. As a result, calling helpers
|
|
introduces no overhead, thus offering excellent performance.
|
|
.sp
|
|
This document is an attempt to list and document the helpers available to eBPF
|
|
developers. They are sorted by chronological order (the oldest helpers in the
|
|
kernel at the top).
|
|
.SH HELPERS
|
|
.INDENT 0.0
|
|
.TP
|
|
.B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&.
|
|
.TP
|
|
.B Return
|
|
Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was
|
|
found.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Add or update the value of the entry associated to \fIkey\fP in
|
|
\fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_NOEXIST\fP
|
|
The entry for \fIkey\fP must not exist in the map.
|
|
.TP
|
|
.B \fBBPF_EXIST\fP
|
|
The entry for \fIkey\fP must already exist in the map.
|
|
.TP
|
|
.B \fBBPF_ANY\fP
|
|
No condition on the existence of the entry for \fIkey\fP\&.
|
|
.UNINDENT
|
|
.sp
|
|
Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types
|
|
\fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all
|
|
elements always exist), the helper would return an error.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Delete entry with \fIkey\fP from \fImap\fP\&.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
For tracing programs, safely attempt to read \fIsize\fP bytes from
|
|
address \fIsrc\fP and store the data in \fIdst\fP\&.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_ktime_get_ns(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Return the time elapsed since system boot, in nanoseconds.
|
|
.TP
|
|
.B Return
|
|
Current \fIktime\fP\&.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is a "printk()\-like" facility for debugging. It
|
|
prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP)
|
|
to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if
|
|
available. It can take up to three additional \fBu64\fP
|
|
arguments (as an eBPF helpers, the total number of arguments is
|
|
limited to five).
|
|
.sp
|
|
Each time the helper is called, it appends a line to the trace.
|
|
The format of the trace is customizable, and the exact output
|
|
one will get depends on the options set in
|
|
\fI/sys/kernel/debug/tracing/trace_options\fP (see also the
|
|
\fIREADME\fP file under the same directory). However, it usually
|
|
defaults to something like:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
In the above:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.INDENT 0.0
|
|
.IP \(bu 2
|
|
\fBtelnet\fP is the name of the current task.
|
|
.IP \(bu 2
|
|
\fB470\fP is the PID of the current task.
|
|
.IP \(bu 2
|
|
\fB001\fP is the CPU number on which the task is
|
|
running.
|
|
.IP \(bu 2
|
|
In \fB\&.N..\fP, each character refers to a set of
|
|
options (whether irqs are enabled, scheduling
|
|
options, whether hard/softirqs are running, level of
|
|
preempt_disabled respectively). \fBN\fP means that
|
|
\fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP
|
|
are set.
|
|
.IP \(bu 2
|
|
\fB419421.045894\fP is a timestamp.
|
|
.IP \(bu 2
|
|
\fB0x00000001\fP is a fake value used by BPF for the
|
|
instruction pointer register.
|
|
.IP \(bu 2
|
|
\fB<formatted msg>\fP is the message formatted with
|
|
\fIfmt\fP\&.
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
The conversion specifiers supported by \fIfmt\fP are similar, but
|
|
more limited than for printk(). They are \fB%d\fP, \fB%i\fP,
|
|
\fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP,
|
|
\fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size
|
|
of field, padding with zeroes, etc.) is available, and the
|
|
helper will return \fB\-EINVAL\fP (but print nothing) if it
|
|
encounters an unknown specifier.
|
|
.sp
|
|
Also, note that \fBbpf_trace_printk\fP() is slow, and should
|
|
only be used for debugging purposes. For this reason, a notice
|
|
bloc (spanning several lines) is printed to kernel logs and
|
|
states that the helper should not be used "for production use"
|
|
the first time this helper is used (or more precisely, when
|
|
\fBtrace_printk\fP() buffers are allocated). For passing values
|
|
to user space, perf events should be preferred.
|
|
.TP
|
|
.B Return
|
|
The number of bytes written to the buffer, or a negative error
|
|
in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_get_prandom_u32(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Get a pseudo\-random number.
|
|
.sp
|
|
From a security point of view, this helper uses its own
|
|
pseudo\-random internal state, and cannot be used to infer the
|
|
seed of other random functions in the kernel. However, it is
|
|
essential to note that the generator used by the helper is not
|
|
cryptographically secure.
|
|
.TP
|
|
.B Return
|
|
A random 32\-bit unsigned value.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_get_smp_processor_id(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Get the SMP (symmetric multiprocessing) processor id. Note that
|
|
all programs run with preemption disabled, which means that the
|
|
SMP processor id is stable during all the execution of the
|
|
program.
|
|
.TP
|
|
.B Return
|
|
The SMP id of the processor running the program.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Store \fIlen\fP bytes from address \fIfrom\fP into the packet
|
|
associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of
|
|
\fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the
|
|
checksum for the packet after storing the bytes) and
|
|
\fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0).
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Recompute the layer 3 (e.g. IP) checksum for the packet
|
|
associated to \fIskb\fP\&. Computation is incremental, so the helper
|
|
must know the former value of the header field that was
|
|
modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the
|
|
number of bytes (2 or 4) for this field, stored in \fIsize\fP\&.
|
|
Alternatively, it is possible to store the difference between
|
|
the previous and the new values of the header field in \fIto\fP, by
|
|
setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP
|
|
indicates the location of the IP checksum within the packet.
|
|
.sp
|
|
This helper works in combination with \fBbpf_csum_diff\fP(),
|
|
which does not update the checksum in\-place, but offers more
|
|
flexibility and can handle sizes larger than 2 or 4 for the
|
|
checksum to update.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the
|
|
packet associated to \fIskb\fP\&. Computation is incremental, so the
|
|
helper must know the former value of the header field that was
|
|
modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the
|
|
number of bytes (2 or 4) for this field, stored on the lowest
|
|
four bits of \fIflags\fP\&. Alternatively, it is possible to store
|
|
the difference between the previous and the new values of the
|
|
header field in \fIto\fP, by setting \fIfrom\fP and the four lowest
|
|
bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the
|
|
location of the IP checksum within the packet. In addition to
|
|
the size of the field, \fIflags\fP can be added (bitwise OR) actual
|
|
flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left
|
|
untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and
|
|
for updates resulting in a null checksum the value is set to
|
|
\fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates
|
|
the checksum is to be computed against a pseudo\-header.
|
|
.sp
|
|
This helper works in combination with \fBbpf_csum_diff\fP(),
|
|
which does not update the checksum in\-place, but offers more
|
|
flexibility and can handle sizes larger than 2 or 4 for the
|
|
checksum to update.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This special helper is used to trigger a "tail call", or in
|
|
other words, to jump into another eBPF program. The same stack
|
|
frame is used (but values on stack and in registers for the
|
|
caller are not accessible to the callee). This mechanism allows
|
|
for program chaining, either for raising the maximum number of
|
|
available eBPF instructions, or to execute given programs in
|
|
conditional blocks. For security reasons, there is an upper
|
|
limit to the number of successive tail calls that can be
|
|
performed.
|
|
.sp
|
|
Upon call of this helper, the program attempts to jump into a
|
|
program referenced at index \fIindex\fP in \fIprog_array_map\fP, a
|
|
special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes
|
|
\fIctx\fP, a pointer to the context.
|
|
.sp
|
|
If the call succeeds, the kernel immediately runs the first
|
|
instruction of the new program. This is not a function call,
|
|
and it never returns to the previous program. If the call
|
|
fails, then the helper has no effect, and the caller continues
|
|
to run its subsequent instructions. A call can fail if the
|
|
destination program for the jump does not exist (i.e. \fIindex\fP
|
|
is superior to the number of entries in \fIprog_array_map\fP), or
|
|
if the maximum number of tail calls has been reached for this
|
|
chain of programs. This limit is defined in the kernel by the
|
|
macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space),
|
|
which is currently set to 32.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Clone and redirect the packet associated to \fIskb\fP to another
|
|
net device of index \fIifindex\fP\&. Both ingress and egress
|
|
interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP
|
|
value in \fIflags\fP is used to make the distinction (ingress path
|
|
is selected if the flag is present, egress path otherwise).
|
|
This is the only flag supported for now.
|
|
.sp
|
|
In comparison with \fBbpf_redirect\fP() helper,
|
|
\fBbpf_clone_redirect\fP() has the associated cost of
|
|
duplicating the packet buffer, but this can be executed out of
|
|
the eBPF program. Conversely, \fBbpf_redirect\fP() is more
|
|
efficient, but it is handled through an action code where the
|
|
redirection happens only after the eBPF program has returned.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_current_pid_tgid(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Return
|
|
A 64\-bit integer containing the current tgid and pid, and
|
|
created as such:
|
|
\fIcurrent_task\fP\fB\->tgid << 32 |\fP
|
|
\fIcurrent_task\fP\fB\->pid\fP\&.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_current_uid_gid(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Return
|
|
A 64\-bit integer containing the current GID and UID, and
|
|
created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of
|
|
\fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of
|
|
the executable (excluding the path) for the current task. The
|
|
\fIsize_of_buf\fP must be strictly positive. On success, the
|
|
helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure,
|
|
it is filled with zeroes.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Retrieve the classid for the current task, i.e. for the net_cls
|
|
cgroup to which \fIskb\fP belongs.
|
|
.sp
|
|
This helper can be used on TC egress path, but not on ingress.
|
|
.sp
|
|
The net_cls cgroup provides an interface to tag network packets
|
|
based on a user\-provided identifier for all traffic coming from
|
|
the tasks belonging to the related cgroup. See also the related
|
|
kernel documentation, available from the Linux sources in file
|
|
\fIDocumentation/cgroup\-v1/net_cls.txt\fP\&.
|
|
.sp
|
|
The Linux kernel has two versions for cgroups: there are
|
|
cgroups v1 and cgroups v2. Both are available to users, who can
|
|
use a mixture of them, but note that the net_cls cgroup is for
|
|
cgroup v1 only. This makes it incompatible with BPF programs
|
|
run on cgroups, which is a cgroup\-v2\-only feature (a socket can
|
|
only hold data for one version of cgroups at a time).
|
|
.sp
|
|
This helper is only available is the kernel was compiled with
|
|
the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to
|
|
"\fBy\fP" or to "\fBm\fP".
|
|
.TP
|
|
.B Return
|
|
The classid, or 0 for the default unconfigured classid.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Push a \fIvlan_tci\fP (VLAN tag control information) of protocol
|
|
\fIvlan_proto\fP to the packet associated to \fIskb\fP, then update
|
|
the checksum. Note that if \fIvlan_proto\fP is different from
|
|
\fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to
|
|
be \fBETH_P_8021Q\fP\&.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Pop a VLAN header from the packet associated to \fIskb\fP\&.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Get tunnel metadata. This helper takes a pointer \fIkey\fP to an
|
|
empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be
|
|
filled with tunnel metadata for the packet associated to \fIskb\fP\&.
|
|
The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which
|
|
indicates that the tunnel is based on IPv6 protocol instead of
|
|
IPv4.
|
|
.sp
|
|
The \fBstruct bpf_tunnel_key\fP is an object that generalizes the
|
|
principal parameters used by various tunneling protocols into a
|
|
single struct. This way, it can be used to easily make a
|
|
decision based on the contents of the encapsulation header,
|
|
"summarized" in this struct. In particular, it holds the IP
|
|
address of the remote end (IPv4 or IPv6, depending on the case)
|
|
in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also,
|
|
this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is
|
|
generally mapped to a VNI (Virtual Network Identifier), making
|
|
it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper.
|
|
.sp
|
|
Let\(aqs imagine that the following code is part of a program
|
|
attached to the TC ingress interface, on one end of a GRE
|
|
tunnel, and is supposed to filter out all messages coming from
|
|
remote ends with IPv4 address other than 10.0.0.1:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
int ret;
|
|
struct bpf_tunnel_key key = {};
|
|
|
|
ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
|
|
if (ret < 0)
|
|
return TC_ACT_SHOT; // drop packet
|
|
|
|
if (key.remote_ipv4 != 0x0a000001)
|
|
return TC_ACT_SHOT; // drop packet
|
|
|
|
return TC_ACT_OK; // accept packet
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
This interface can also be used with all encapsulation devices
|
|
that can operate in "collect metadata" mode: instead of having
|
|
one network device per specific configuration, the "collect
|
|
metadata" mode only requires a single device where the
|
|
configuration can be extracted from this helper.
|
|
.sp
|
|
This can be used together with various tunnels such as VXLan,
|
|
Geneve, GRE or IP in IP (IPIP).
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Populate tunnel metadata for packet associated to \fIskb.\fP The
|
|
tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The
|
|
\fIflags\fP can be set to a combination of the following values:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_F_TUNINFO_IPV6\fP
|
|
Indicate that the tunnel is based on IPv6 protocol
|
|
instead of IPv4.
|
|
.TP
|
|
.B \fBBPF_F_ZERO_CSUM_TX\fP
|
|
For IPv4 packets, add a flag to tunnel metadata
|
|
indicating that checksum computation should be skipped
|
|
and checksum set to zeroes.
|
|
.TP
|
|
.B \fBBPF_F_DONT_FRAGMENT\fP
|
|
Add a flag to tunnel metadata indicating that the
|
|
packet should not be fragmented.
|
|
.TP
|
|
.B \fBBPF_F_SEQ_NUMBER\fP
|
|
Add a flag to tunnel metadata indicating that a
|
|
sequence number should be added to tunnel header before
|
|
sending the packet. This flag was added for GRE
|
|
encapsulation, but might be used with other protocols
|
|
as well in the future.
|
|
.UNINDENT
|
|
.sp
|
|
Here is a typical usage on the transmit path:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
struct bpf_tunnel_key key;
|
|
populate key ...
|
|
bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
|
|
bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
See also the description of the \fBbpf_skb_get_tunnel_key\fP()
|
|
helper for additional information.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Read the value of a perf event counter. This helper relies on a
|
|
\fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of
|
|
the perf event counter is selected when \fImap\fP is updated with
|
|
perf event file descriptors. The \fImap\fP is an array whose size
|
|
is the number of available CPUs, and each cell contains a value
|
|
relative to one CPU. The value to retrieve is indicated by
|
|
\fIflags\fP, that contains the index of the CPU to look up, masked
|
|
with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to
|
|
\fBBPF_F_CURRENT_CPU\fP to indicate that the value for the
|
|
current CPU should be retrieved.
|
|
.sp
|
|
Note that before Linux 4.13, only hardware perf event can be
|
|
retrieved.
|
|
.sp
|
|
Also, be aware that the newer helper
|
|
\fBbpf_perf_event_read_value\fP() is recommended over
|
|
\fBbpf_perf_event_read\fP() in general. The latter has some ABI
|
|
quirks where error and counter value are used as a return code
|
|
(which is wrong to do since ranges may overlap). This issue is
|
|
fixed with \fBbpf_perf_event_read_value\fP(), which at the same
|
|
time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of
|
|
\fBbpf_perf_event_read_value\fP() for details.
|
|
.TP
|
|
.B Return
|
|
The value of the perf event counter read from the map, or a
|
|
negative error code in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Redirect the packet to another net device of index \fIifindex\fP\&.
|
|
This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides
|
|
increased performance.
|
|
.sp
|
|
Except for XDP, both ingress and egress interfaces can be used
|
|
for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used
|
|
to make the distinction (ingress path is selected if the flag
|
|
is present, egress path otherwise). Currently, XDP only
|
|
supports redirection to the egress interface, and accepts no
|
|
flag at all.
|
|
.sp
|
|
The same effect can be attained with the more generic
|
|
\fBbpf_redirect_map\fP(), which requires specific maps to be
|
|
used but offers better performance.
|
|
.TP
|
|
.B Return
|
|
For XDP, the helper returns \fBXDP_REDIRECT\fP on success or
|
|
\fBXDP_ABORTED\fP on error. For other program types, the values
|
|
are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on
|
|
error.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Retrieve the realm or the route, that is to say the
|
|
\fBtclassid\fP field of the destination for the \fIskb\fP\&. The
|
|
indentifier retrieved is a user\-provided tag, similar to the
|
|
one used with the net_cls cgroup (see description for
|
|
\fBbpf_get_cgroup_classid\fP() helper), but here this tag is
|
|
held by a route (a destination entry), not by a task.
|
|
.sp
|
|
Retrieving this identifier works with the clsact TC egress hook
|
|
(see also \fBtc\-bpf(8)\fP), or alternatively on conventional
|
|
classful egress qdiscs, but not on TC ingress path. In case of
|
|
clsact TC egress hook, this has the advantage that, internally,
|
|
the destination entry has not been dropped yet in the transmit
|
|
path. Therefore, the destination entry does not need to be
|
|
artificially held via \fBnetif_keep_dst\fP() for a classful
|
|
qdisc until the \fIskb\fP is freed.
|
|
.sp
|
|
This helper is available only if the kernel was compiled with
|
|
\fBCONFIG_IP_ROUTE_CLASSID\fP configuration option.
|
|
.TP
|
|
.B Return
|
|
The realm of the route for the packet associated to \fIskb\fP, or 0
|
|
if none was found.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_perf_event_output(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Write raw \fIdata\fP blob into a special BPF perf event held by
|
|
\fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf
|
|
event must have the following attributes: \fBPERF_SAMPLE_RAW\fP
|
|
as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and
|
|
\fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&.
|
|
.sp
|
|
The \fIflags\fP are used to indicate the index in \fImap\fP for which
|
|
the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&.
|
|
Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP
|
|
to indicate that the index of the current CPU core should be
|
|
used.
|
|
.sp
|
|
The value to write, of \fIsize\fP, is passed through eBPF stack and
|
|
pointed by \fIdata\fP\&.
|
|
.sp
|
|
The context of the program \fIctx\fP needs also be passed to the
|
|
helper.
|
|
.sp
|
|
On user space, a program willing to read the values needs to
|
|
call \fBperf_event_open\fP() on the perf event (either for
|
|
one or for all CPUs) and to store the file descriptor into the
|
|
\fImap\fP\&. This must be done before the eBPF program can send data
|
|
into it. An example is available in file
|
|
\fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source
|
|
tree (the eBPF program counterpart is in
|
|
\fIsamples/bpf/trace_output_kern.c\fP).
|
|
.sp
|
|
\fBbpf_perf_event_output\fP() achieves better performance
|
|
than \fBbpf_trace_printk\fP() for sharing data with user
|
|
space, and is much better suitable for streaming data from eBPF
|
|
programs.
|
|
.sp
|
|
Note that this helper is not restricted to tracing use cases
|
|
and can be used with programs attached to TC or XDP as well,
|
|
where it allows for passing data to user space listeners. Data
|
|
can be:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
Only custom structs,
|
|
.IP \(bu 2
|
|
Only the packet payload, or
|
|
.IP \(bu 2
|
|
A combination of both.
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper was provided as an easy way to load data from a
|
|
packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from
|
|
the packet associated to \fIskb\fP, into the buffer pointed by
|
|
\fIto\fP\&.
|
|
.sp
|
|
Since Linux 4.7, usage of this helper has mostly been replaced
|
|
by "direct packet access", enabling packet data to be
|
|
manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP
|
|
pointing respectively to the first byte of packet data and to
|
|
the byte after the last byte of packet data. However, it
|
|
remains useful if one wishes to read large quantities of data
|
|
at once from a packet into the eBPF stack.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_get_stackid(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Walk a user or a kernel stack and return its id. To achieve
|
|
this, the helper needs \fIctx\fP, which is a pointer to the context
|
|
on which the tracing program is executed, and a pointer to a
|
|
\fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&.
|
|
.sp
|
|
The last argument, \fIflags\fP, holds the number of stack frames to
|
|
skip (from 0 to 255), masked with
|
|
\fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set
|
|
a combination of the following flags:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_F_USER_STACK\fP
|
|
Collect a user space stack instead of a kernel stack.
|
|
.TP
|
|
.B \fBBPF_F_FAST_STACK_CMP\fP
|
|
Compare stacks by hash only.
|
|
.TP
|
|
.B \fBBPF_F_REUSE_STACKID\fP
|
|
If two different stacks hash into the same \fIstackid\fP,
|
|
discard the old one.
|
|
.UNINDENT
|
|
.sp
|
|
The stack id retrieved is a 32 bit long integer handle which
|
|
can be further combined with other data (including other stack
|
|
ids) and used as a key into maps. This can be useful for
|
|
generating a variety of graphs (such as flame graphs or off\-cpu
|
|
graphs).
|
|
.sp
|
|
For walking a stack, this helper is an improvement over
|
|
\fBbpf_probe_read\fP(), which can be used with unrolled loops
|
|
but is not efficient and consumes a lot of eBPF instructions.
|
|
Instead, \fBbpf_get_stackid\fP() can collect up to
|
|
\fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that
|
|
this limit can be controlled with the \fBsysctl\fP program, and
|
|
that it should be manually increased in order to profile long
|
|
user stacks (such as stacks for Java programs). To do so, use:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
# sysctl kernel.perf_event_max_stack=<new value>
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
The positive or null stack id on success, or a negative error
|
|
in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Compute a checksum difference, from the raw buffer pointed by
|
|
\fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4),
|
|
towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP
|
|
(same remark). An optional \fIseed\fP can be added to the value
|
|
(this can be cascaded, the seed may come from a previous call
|
|
to the helper).
|
|
.sp
|
|
This is flexible enough to be used in several ways:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to
|
|
checksum, it can be used when pushing new data.
|
|
.IP \(bu 2
|
|
With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to
|
|
checksum, it can be used when removing data from a packet.
|
|
.IP \(bu 2
|
|
With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it
|
|
can be used to compute a diff. Note that \fIfrom_size\fP and
|
|
\fIto_size\fP do not need to be equal.
|
|
.UNINDENT
|
|
.sp
|
|
This helper can be used in combination with
|
|
\fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to
|
|
which one can feed in the difference computed with
|
|
\fBbpf_csum_diff\fP().
|
|
.TP
|
|
.B Return
|
|
The checksum result, or a negative error code in case of
|
|
failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Retrieve tunnel options metadata for the packet associated to
|
|
\fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP
|
|
of \fIsize\fP\&.
|
|
.sp
|
|
This helper can be used with encapsulation devices that can
|
|
operate in "collect metadata" mode (please refer to the related
|
|
note in the description of \fBbpf_skb_get_tunnel_key\fP() for
|
|
more details). A particular example where this can be used is
|
|
in combination with the Geneve encapsulation protocol, where it
|
|
allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper)
|
|
and retrieving arbitrary TLVs (Type\-Length\-Value headers) from
|
|
the eBPF program. This allows for full customization of these
|
|
headers.
|
|
.TP
|
|
.B Return
|
|
The size of the option data retrieved.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Set tunnel options metadata for the packet associated to \fIskb\fP
|
|
to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&.
|
|
.sp
|
|
See also the description of the \fBbpf_skb_get_tunnel_opt\fP()
|
|
helper for additional information.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently
|
|
supported are transition from IPv4 to IPv6, and from IPv6 to
|
|
IPv4. The helper takes care of the groundwork for the
|
|
transition, including resizing the socket buffer. The eBPF
|
|
program is expected to fill the new headers, if any, via
|
|
\fBskb_store_bytes\fP() and to recompute the checksums with
|
|
\fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64
|
|
operations out of an eBPF program.
|
|
.sp
|
|
Internally, the GSO type is marked as dodgy so that headers are
|
|
checked and segments are recalculated by the GSO/GRO engine.
|
|
The size for GSO target is adapted as well.
|
|
.sp
|
|
All values for \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Change the packet type for the packet associated to \fIskb\fP\&. This
|
|
comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except
|
|
the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows
|
|
for graceful handling of errors.
|
|
.sp
|
|
The major use case is to change incoming \fIskb*s to
|
|
**PACKET_HOST*\fP in a programmatic way instead of having to
|
|
recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for
|
|
example.
|
|
.sp
|
|
Note that \fItype\fP only allows certain values. At this time, they
|
|
are:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBPACKET_HOST\fP
|
|
Packet is for us.
|
|
.TP
|
|
.B \fBPACKET_BROADCAST\fP
|
|
Send packet to all.
|
|
.TP
|
|
.B \fBPACKET_MULTICAST\fP
|
|
Send packet to group.
|
|
.TP
|
|
.B \fBPACKET_OTHERHOST\fP
|
|
Send packet to someone else.
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Check whether \fIskb\fP is a descendant of the cgroup2 held by
|
|
\fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&.
|
|
.TP
|
|
.B Return
|
|
The return value depends on the result of the test, and can be:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
0, if the \fIskb\fP failed the cgroup2 descendant test.
|
|
.IP \(bu 2
|
|
1, if the \fIskb\fP succeeded the cgroup2 descendant test.
|
|
.IP \(bu 2
|
|
A negative error code, if an error occurred.
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is
|
|
not set, in particular if the hash was cleared due to mangling,
|
|
recompute this hash. Later accesses to the hash can be done
|
|
directly with \fIskb\fP\fB\->hash\fP\&.
|
|
.sp
|
|
Calling \fBbpf_set_hash_invalid\fP(), changing a packet
|
|
prototype with \fBbpf_skb_change_proto\fP(), or calling
|
|
\fBbpf_skb_store_bytes\fP() with the
|
|
\fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear
|
|
the hash and to trigger a new computation for the next call to
|
|
\fBbpf_get_hash_recalc\fP().
|
|
.TP
|
|
.B Return
|
|
The 32\-bit hash.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_current_task(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Return
|
|
A pointer to the current task struct.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Attempt in a safe way to write \fIlen\fP bytes from the buffer
|
|
\fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in
|
|
user context, and \fIdst\fP must be a valid user space address.
|
|
.sp
|
|
This helper should not be used to implement any kind of
|
|
security mechanism because of TOC\-TOU attacks, but rather to
|
|
debug, divert, and manipulate execution of semi\-cooperative
|
|
processes.
|
|
.sp
|
|
Keep in mind that this feature is meant for experiments, and it
|
|
has a risk of crashing the system and running programs.
|
|
Therefore, when an eBPF program using this helper is attached,
|
|
a warning including PID and process name is printed to kernel
|
|
logs.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Check whether the probe is being run is the context of a given
|
|
subset of the cgroup2 hierarchy. The cgroup2 to test is held by
|
|
\fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&.
|
|
.TP
|
|
.B Return
|
|
The return value depends on the result of the test, and can be:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
0, if the \fIskb\fP task belongs to the cgroup2.
|
|
.IP \(bu 2
|
|
1, if the \fIskb\fP task does not belong to the cgroup2.
|
|
.IP \(bu 2
|
|
A negative error code, if an error occurred.
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Resize (trim or grow) the packet associated to \fIskb\fP to the
|
|
new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.sp
|
|
The basic idea is that the helper performs the needed work to
|
|
change the size of the packet, then the eBPF program rewrites
|
|
the rest via helpers like \fBbpf_skb_store_bytes\fP(),
|
|
\fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP()
|
|
and others. This helper is a slow path utility intended for
|
|
replies with control messages. And because it is targeted for
|
|
slow path, the helper itself can afford to be slow: it
|
|
implicitly linearizes, unclones and drops offloads from the
|
|
\fIskb\fP\&.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Pull in non\-linear data in case the \fIskb\fP is non\-linear and not
|
|
all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes
|
|
from \fIskb\fP readable and writable. If a zero value is passed for
|
|
\fIlen\fP, then the whole length of the \fIskb\fP is pulled.
|
|
.sp
|
|
This helper is only needed for reading and writing with direct
|
|
packet access.
|
|
.sp
|
|
For direct packet access, testing that offsets to access
|
|
are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is
|
|
susceptible to fail if offsets are invalid, or if the requested
|
|
data is in non\-linear parts of the \fIskb\fP\&. On failure the
|
|
program can just bail out, or in the case of a non\-linear
|
|
buffer, use a helper to make the data available. The
|
|
\fBbpf_skb_load_bytes\fP() helper is a first solution to access
|
|
the data. Another one consists in using \fBbpf_skb_pull_data\fP
|
|
to pull in once the non\-linear parts, then retesting and
|
|
eventually access the data.
|
|
.sp
|
|
At the same time, this also makes sure the \fIskb\fP is uncloned,
|
|
which is a necessary condition for direct write. As this needs
|
|
to be an invariant for the write part only, the verifier
|
|
detects writes and adds a prologue that is calling
|
|
\fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from
|
|
the very beginning in case it is indeed cloned.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the
|
|
driver has supplied a checksum for the entire packet into that
|
|
field. Return an error otherwise. This helper is intended to be
|
|
used in combination with \fBbpf_csum_diff\fP(), in particular
|
|
when the checksum needs to be updated after data has been
|
|
written into the packet through direct packet access.
|
|
.TP
|
|
.B Return
|
|
The checksum on success, or a negative error code in case of
|
|
failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after
|
|
mangling on headers through direct packet access, in order to
|
|
indicate that the hash is outdated and to trigger a
|
|
recalculation the next time the kernel tries to access this
|
|
hash or when the \fBbpf_get_hash_recalc\fP() helper is called.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_get_numa_node_id(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Return the id of the current NUMA node. The primary use case
|
|
for this helper is the selection of sockets for the local NUMA
|
|
node, when the program is attached to sockets using the
|
|
\fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP),
|
|
but the helper is also available to other eBPF program types,
|
|
similarly to \fBbpf_get_smp_processor_id\fP().
|
|
.TP
|
|
.B Return
|
|
The id of current NUMA node.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Grows headroom of packet associated to \fIskb\fP and adjusts the
|
|
offset of the MAC header accordingly, adding \fIlen\fP bytes of
|
|
space. It automatically extends and reallocates memory as
|
|
required.
|
|
.sp
|
|
This helper can be used on a layer 3 \fIskb\fP to push a MAC header
|
|
for redirection into a layer 2 device.
|
|
.sp
|
|
All values for \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that
|
|
it is possible to use a negative value for \fIdelta\fP\&. This helper
|
|
can be used to prepare the packet for pushing or popping
|
|
headers.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Copy a NUL terminated string from an unsafe address
|
|
\fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the
|
|
terminating NUL byte. In case the string length is smaller than
|
|
\fIsize\fP, the target is not padded with further NUL bytes. If the
|
|
string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are
|
|
copied and the last byte is set to NUL.
|
|
.sp
|
|
On success, the length of the copied string is returned. This
|
|
makes this helper useful in tracing programs for reading
|
|
strings, and more importantly to get its length at runtime. See
|
|
the following snippet:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
SEC("kprobe/sys_open")
|
|
void bpf_sys_open(struct pt_regs *ctx)
|
|
{
|
|
char buf[PATHLEN]; // PATHLEN is defined to 256
|
|
int res = bpf_probe_read_str(buf, sizeof(buf),
|
|
ctx\->di);
|
|
|
|
// Consume buf, for example push it to
|
|
// userspace via bpf_perf_event_output(); we
|
|
// can use res (the string length) as event
|
|
// size, after checking its boundaries.
|
|
}
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
In comparison, using \fBbpf_probe_read()\fP helper here instead
|
|
to read the string would require to estimate the length at
|
|
compile time, and would often result in copying more memory
|
|
than necessary.
|
|
.sp
|
|
Another useful use case is when parsing individual process
|
|
arguments or individual environment variables navigating
|
|
\fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value,
|
|
one can quickly iterate at the right offset of the memory area.
|
|
.TP
|
|
.B Return
|
|
On success, the strictly positive length of the string,
|
|
including the trailing NUL character. On error, a negative
|
|
value.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket,
|
|
retrieve the cookie (generated by the kernel) of this socket.
|
|
If no cookie has been set yet, generate a new cookie. Once
|
|
generated, the socket cookie remains stable for the life of the
|
|
socket. This helper can be useful for monitoring per socket
|
|
networking traffic statistics as it provides a unique socket
|
|
identifier per namespace.
|
|
.TP
|
|
.B Return
|
|
A 8\-byte long non\-decreasing number on success, or 0 if the
|
|
socket field is missing inside \fIskb\fP\&.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Equivalent to bpf_get_socket_cookie() helper that accepts
|
|
\fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP contex.
|
|
.TP
|
|
.B Return
|
|
A 8\-byte long non\-decreasing number.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Equivalent to bpf_get_socket_cookie() helper that accepts
|
|
\fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP contex.
|
|
.TP
|
|
.B Return
|
|
A 8\-byte long non\-decreasing number.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Return
|
|
The owner UID of the socket associated to \fIskb\fP\&. If the socket
|
|
is \fBNULL\fP, or if it is not a full socket (i.e. if it is a
|
|
time\-wait or a request socket instead), \fBoverflowuid\fP value
|
|
is returned (note that \fBoverflowuid\fP might also be the actual
|
|
UID value for the socket).
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP)
|
|
to value \fIhash\fP\&.
|
|
.TP
|
|
.B Return
|
|
0
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Emulate a call to \fBsetsockopt()\fP on the socket associated to
|
|
\fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at
|
|
which the option resides and the name \fIoptname\fP of the option
|
|
must be specified, see \fBsetsockopt(2)\fP for more information.
|
|
The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&.
|
|
.sp
|
|
This helper actually implements a subset of \fBsetsockopt()\fP\&.
|
|
It supports the following \fIlevel\fPs:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
\fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs:
|
|
\fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP,
|
|
\fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&.
|
|
.IP \(bu 2
|
|
\fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs:
|
|
\fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP,
|
|
\fBTCP_BPF_SNDCWND_CLAMP\fP\&.
|
|
.IP \(bu 2
|
|
\fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&.
|
|
.IP \(bu 2
|
|
\fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&.
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Grow or shrink the room for data in the packet associated to
|
|
\fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&.
|
|
.sp
|
|
There is a single supported mode at this time:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
\fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer
|
|
(room space is added or removed below the layer 3 header).
|
|
.UNINDENT
|
|
.sp
|
|
All values for \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Redirect the packet to the endpoint referenced by \fImap\fP at
|
|
index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain
|
|
references to net devices (for forwarding packets through other
|
|
ports), or to CPUs (for redirecting XDP frames to another CPU;
|
|
but this is only implemented for native XDP (with driver
|
|
support) as of this writing).
|
|
.sp
|
|
All values for \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.sp
|
|
When used to redirect packets to net devices, this helper
|
|
provides a high performance increase over \fBbpf_redirect\fP().
|
|
This is due to various implementation details of the underlying
|
|
mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device.
|
|
.TP
|
|
.B Return
|
|
\fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Redirect the packet to the socket referenced by \fImap\fP (of type
|
|
\fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and
|
|
egress interfaces can be used for redirection. The
|
|
\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
|
|
distinction (ingress path is selected if the flag is present,
|
|
egress path otherwise). This is the only flag supported for now.
|
|
.TP
|
|
.B Return
|
|
\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Add an entry to, or update a \fImap\fP referencing sockets. The
|
|
\fIskops\fP is used as a new value for the entry associated to
|
|
\fIkey\fP\&. \fIflags\fP is one of:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_NOEXIST\fP
|
|
The entry for \fIkey\fP must not exist in the map.
|
|
.TP
|
|
.B \fBBPF_EXIST\fP
|
|
The entry for \fIkey\fP must already exist in the map.
|
|
.TP
|
|
.B \fBBPF_ANY\fP
|
|
No condition on the existence of the entry for \fIkey\fP\&.
|
|
.UNINDENT
|
|
.sp
|
|
If the \fImap\fP has eBPF programs (parser and verdict), those will
|
|
be inherited by the socket being added. If the socket is
|
|
already attached to eBPF programs, this results in an error.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by
|
|
\fIdelta\fP (which can be positive or negative). Note that this
|
|
operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP,
|
|
so the latter must be loaded only after the helper has been
|
|
called.
|
|
.sp
|
|
The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs
|
|
are not required to use it. The rationale is that when the
|
|
packet is processed with XDP (e.g. as DoS filter), it is
|
|
possible to push further meta data along with it before passing
|
|
to the stack, and to give the guarantee that an ingress eBPF
|
|
program attached as a TC classifier on the same device can pick
|
|
this up for further post\-processing. Since TC works with socket
|
|
buffers, it remains possible to set from XDP the \fBmark\fP or
|
|
\fBpriority\fP pointers, or other pointers for the socket buffer.
|
|
Having this scratch space generic and programmable allows for
|
|
more flexibility as the user is free to store whatever meta
|
|
data they need.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Read the value of a perf event counter, and store it into \fIbuf\fP
|
|
of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type
|
|
\fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event
|
|
counter is selected when \fImap\fP is updated with perf event file
|
|
descriptors. The \fImap\fP is an array whose size is the number of
|
|
available CPUs, and each cell contains a value relative to one
|
|
CPU. The value to retrieve is indicated by \fIflags\fP, that
|
|
contains the index of the CPU to look up, masked with
|
|
\fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to
|
|
\fBBPF_F_CURRENT_CPU\fP to indicate that the value for the
|
|
current CPU should be retrieved.
|
|
.sp
|
|
This helper behaves in a way close to
|
|
\fBbpf_perf_event_read\fP() helper, save that instead of
|
|
just returning the value observed, it fills the \fIbuf\fP
|
|
structure. This allows for additional data to be retrieved: in
|
|
particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are
|
|
copied. In general, \fBbpf_perf_event_read_value\fP() is
|
|
recommended over \fBbpf_perf_event_read\fP(), which has some
|
|
ABI issues and provides fewer functionalities.
|
|
.sp
|
|
These values are interesting, because hardware PMU (Performance
|
|
Monitoring Unit) counters are limited resources. When there are
|
|
more PMU based perf events opened than available counters,
|
|
kernel will multiplex these events so each event gets certain
|
|
percentage (but not all) of the PMU time. In case that
|
|
multiplexing happens, the number of samples or counter value
|
|
will not reflect the case compared to when no multiplexing
|
|
occurs. This makes comparison between different runs difficult.
|
|
Typically, the counter value should be normalized before
|
|
comparing to other experiments. The usual normalization is done
|
|
as follows.
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
normalized_counter = counter * t_enabled / t_running
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.sp
|
|
Where t_enabled is the time enabled for event and t_running is
|
|
the time running for event since last normalization. The
|
|
enabled and running times are accumulated since the perf event
|
|
open. To achieve scaling factor between two invocations of an
|
|
eBPF program, users can can use CPU id as the key (which is
|
|
typical for perf array usage model) to remember the previous
|
|
value and do the calculation inside the eBPF program.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
For en eBPF program attached to a perf event, retrieve the
|
|
value of the event counter associated to \fIctx\fP and store it in
|
|
the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled
|
|
and running times are also stored in the structure (see
|
|
description of helper \fBbpf_perf_event_read_value\fP() for
|
|
more details).
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Emulate a call to \fBgetsockopt()\fP on the socket associated to
|
|
\fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at
|
|
which the option resides and the name \fIoptname\fP of the option
|
|
must be specified, see \fBgetsockopt(2)\fP for more information.
|
|
The retrieved value is stored in the structure pointed by
|
|
\fIopval\fP and of length \fIoptlen\fP\&.
|
|
.sp
|
|
This helper actually implements a subset of \fBgetsockopt()\fP\&.
|
|
It supports the following \fIlevel\fPs:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
\fBIPPROTO_TCP\fP, which supports \fIoptname\fP
|
|
\fBTCP_CONGESTION\fP\&.
|
|
.IP \(bu 2
|
|
\fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&.
|
|
.IP \(bu 2
|
|
\fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&.
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_override_return(struct pt_reg *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Used for error injection, this helper uses kprobes to override
|
|
the return value of the probed function, and to set it to \fIrc\fP\&.
|
|
The first argument is the context \fIregs\fP on which the kprobe
|
|
works.
|
|
.sp
|
|
This helper works by setting setting the PC (program counter)
|
|
to an override function which is run in place of the original
|
|
probed function. This means the probed function is not run at
|
|
all. The replacement function just returns with the required
|
|
value.
|
|
.sp
|
|
This helper has security implications, and thus is subject to
|
|
restrictions. It is only available if the kernel was compiled
|
|
with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration
|
|
option, and in this case it only works on functions tagged with
|
|
\fBALLOW_ERROR_INJECTION\fP in the kernel code.
|
|
.sp
|
|
Also, the helper is only available for the architectures having
|
|
the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
|
|
x86 architecture is the only one to support this feature.
|
|
.TP
|
|
.B Return
|
|
0
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field
|
|
for the full TCP socket associated to \fIbpf_sock_ops\fP to
|
|
\fIargval\fP\&.
|
|
.sp
|
|
The primary use of this field is to determine if there should
|
|
be calls to eBPF programs of type
|
|
\fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP
|
|
code. A program of the same type can change its value, per
|
|
connection and as necessary, when the connection is
|
|
established. This field is directly accessible for reading, but
|
|
this helper must be used for updates in order to return an
|
|
error if an eBPF program tries to set a callback that is not
|
|
supported in the current kernel.
|
|
.sp
|
|
The supported callback values that \fIargval\fP can combine are:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
\fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out)
|
|
.IP \(bu 2
|
|
\fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission)
|
|
.IP \(bu 2
|
|
\fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change)
|
|
.UNINDENT
|
|
.sp
|
|
Here are some examples of where one could call such eBPF
|
|
program:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
When RTO fires.
|
|
.IP \(bu 2
|
|
When a packet is retransmitted.
|
|
.IP \(bu 2
|
|
When the connection terminates.
|
|
.IP \(bu 2
|
|
When a packet is sent.
|
|
.IP \(bu 2
|
|
When a packet is received.
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
Code \fB\-EINVAL\fP if the socket is not a full TCP socket;
|
|
otherwise, a positive number containing the bits that could not
|
|
be set is returned (which comes down to 0 if all bits were set
|
|
as required).
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is used in programs implementing policies at the
|
|
socket level. If the message \fImsg\fP is allowed to pass (i.e. if
|
|
the verdict eBPF program returns \fBSK_PASS\fP), redirect it to
|
|
the socket referenced by \fImap\fP (of type
|
|
\fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and
|
|
egress interfaces can be used for redirection. The
|
|
\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
|
|
distinction (ingress path is selected if the flag is present,
|
|
egress path otherwise). This is the only flag supported for now.
|
|
.TP
|
|
.B Return
|
|
\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
For socket policies, apply the verdict of the eBPF program to
|
|
the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&.
|
|
.sp
|
|
For example, this helper can be used in the following cases:
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
A single \fBsendmsg\fP() or \fBsendfile\fP() system call
|
|
contains multiple logical messages that the eBPF program is
|
|
supposed to read and for which it should apply a verdict.
|
|
.IP \(bu 2
|
|
An eBPF program only cares to read the first \fIbytes\fP of a
|
|
\fImsg\fP\&. If the message has a large payload, then setting up
|
|
and calling the eBPF program repeatedly for all bytes, even
|
|
though the verdict is already known, would create unnecessary
|
|
overhead.
|
|
.UNINDENT
|
|
.sp
|
|
When called from within an eBPF program, the helper sets a
|
|
counter internal to the BPF infrastructure, that is used to
|
|
apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is
|
|
smaller than the current data being processed from a
|
|
\fBsendmsg\fP() or \fBsendfile\fP() system call, the first
|
|
\fIbytes\fP will be sent and the eBPF program will be re\-run with
|
|
the pointer for start of data pointing to byte number \fIbytes\fP
|
|
\fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being
|
|
processed, then the eBPF verdict will be applied to multiple
|
|
\fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are
|
|
consumed.
|
|
.sp
|
|
Note that if a socket closes with the internal counter holding
|
|
a non\-zero value, this is not a problem because data is not
|
|
being buffered for \fIbytes\fP and is sent as it is received.
|
|
.TP
|
|
.B Return
|
|
0
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
For socket policies, prevent the execution of the verdict eBPF
|
|
program for message \fImsg\fP until \fIbytes\fP (byte number) have been
|
|
accumulated.
|
|
.sp
|
|
This can be used when one needs a specific number of bytes
|
|
before a verdict can be assigned, even if the data spans
|
|
multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme
|
|
case would be a user calling \fBsendmsg\fP() repeatedly with
|
|
1\-byte long message segments. Obviously, this is bad for
|
|
performance, but it is still valid. If the eBPF program needs
|
|
\fIbytes\fP bytes to validate a header, this helper can be used to
|
|
prevent the eBPF program to be called again until \fIbytes\fP have
|
|
been accumulated.
|
|
.TP
|
|
.B Return
|
|
0
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
For socket policies, pull in non\-linear data from user space
|
|
for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP,
|
|
respectively.
|
|
.sp
|
|
If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a
|
|
\fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP)
|
|
pointers have already consumed. For \fBsendmsg\fP() hooks this
|
|
is likely the first scatterlist element. But for calls relying
|
|
on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will
|
|
be the range (\fB0\fP, \fB0\fP) because the data is shared with
|
|
user space and by default the objective is to avoid allowing
|
|
user space to modify data while (or after) eBPF verdict is
|
|
being decided. This helper can be used to pull in data and to
|
|
set the start and end pointer to given values. Data will be
|
|
copied if necessary (i.e. if data was not linear and if start
|
|
and end pointers do not point to the same chunk).
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.sp
|
|
All values for \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Bind the socket associated to \fIctx\fP to the address pointed by
|
|
\fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing
|
|
connection from the desired IP address, which can be useful for
|
|
example when all processes inside a cgroup should use one
|
|
single IP address on a host that has multiple IP configured.
|
|
.sp
|
|
This helper works for IPv4 and IPv6, TCP and UDP sockets. The
|
|
domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or
|
|
\fBAF_INET6\fP). Looking for a free port to bind to can be
|
|
expensive, therefore binding to port is not permitted by the
|
|
helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively)
|
|
must be set to zero.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is
|
|
only possible to shrink the packet as of this writing,
|
|
therefore \fIdelta\fP must be a negative integer.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Retrieve the XFRM state (IP transform framework, see also
|
|
\fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&.
|
|
.sp
|
|
The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP
|
|
pointed by \fIxfrm_state\fP and of length \fIsize\fP\&.
|
|
.sp
|
|
All values for \fIflags\fP are reserved for future usage, and must
|
|
be left at zero.
|
|
.sp
|
|
This helper is available only if the kernel was compiled with
|
|
\fBCONFIG_XFRM\fP configuration option.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Return a user or a kernel stack in bpf program provided buffer.
|
|
To achieve this, the helper needs \fIctx\fP, which is a pointer
|
|
to the context on which the tracing program is executed.
|
|
To store the stacktrace, the bpf program provides \fIbuf\fP with
|
|
a nonnegative \fIsize\fP\&.
|
|
.sp
|
|
The last argument, \fIflags\fP, holds the number of stack frames to
|
|
skip (from 0 to 255), masked with
|
|
\fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set
|
|
the following flags:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_F_USER_STACK\fP
|
|
Collect a user space stack instead of a kernel stack.
|
|
.TP
|
|
.B \fBBPF_F_USER_BUILD_ID\fP
|
|
Collect buildid+offset instead of ips for user stack,
|
|
only valid if \fBBPF_F_USER_STACK\fP is also specified.
|
|
.UNINDENT
|
|
.sp
|
|
\fBbpf_get_stack\fP() can collect up to
|
|
\fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject
|
|
to sufficient large buffer size. Note that
|
|
this limit can be controlled with the \fBsysctl\fP program, and
|
|
that it should be manually increased in order to profile long
|
|
user stacks (such as stacks for Java programs). To do so, use:
|
|
.INDENT 7.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
# sysctl kernel.perf_event_max_stack=<new value>
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.TP
|
|
.B Return
|
|
A non\-negative value equal to or less than \fIsize\fP on success,
|
|
or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is similar to \fBbpf_skb_load_bytes\fP() in that
|
|
it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP
|
|
from the packet associated to \fIskb\fP, into the buffer pointed
|
|
by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that
|
|
a fifth argument \fIstart_header\fP exists in order to select a
|
|
base offset to start from. \fIstart_header\fP can be one of:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_HDR_START_MAC\fP
|
|
Base offset to load data from is \fIskb\fP\(aqs mac header.
|
|
.TP
|
|
.B \fBBPF_HDR_START_NET\fP
|
|
Base offset to load data from is \fIskb\fP\(aqs network header.
|
|
.UNINDENT
|
|
.sp
|
|
In general, "direct packet access" is the preferred method to
|
|
access packet data, however, this helper is in particular useful
|
|
in socket filters where \fIskb\fP\fB\->data\fP does not always point
|
|
to the start of the mac header and where "direct packet access"
|
|
is not available.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Do FIB lookup in kernel tables using parameters in \fIparams\fP\&.
|
|
If lookup is successful and result shows packet is to be
|
|
forwarded, the neighbor tables are searched for the nexthop.
|
|
If successful (ie., FIB lookup shows forwarding and nexthop
|
|
is resolved), the nexthop address is returned in ipv4_dst
|
|
or ipv6_dst based on family, smac is set to mac address of
|
|
egress device, dmac is set to nexthop mac address, rt_metric
|
|
is set to metric from route (IPv4/IPv6 only), and ifindex
|
|
is set to the device index of the nexthop from the FIB lookup.
|
|
.sp
|
|
\fIplen\fP argument is the size of the passed in struct.
|
|
\fIflags\fP argument can be a combination of one or more of the
|
|
following values:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_FIB_LOOKUP_DIRECT\fP
|
|
Do a direct table lookup vs full lookup using FIB
|
|
rules.
|
|
.TP
|
|
.B \fBBPF_FIB_LOOKUP_OUTPUT\fP
|
|
Perform lookup from an egress perspective (default is
|
|
ingress).
|
|
.UNINDENT
|
|
.sp
|
|
\fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or
|
|
\fBstruct sk_buff\fP tc cls_act programs.
|
|
.TP
|
|
.B Return
|
|
.INDENT 7.0
|
|
.IP \(bu 2
|
|
< 0 if any input argument is invalid
|
|
.IP \(bu 2
|
|
0 on success (packet is forwarded, nexthop neighbor exists)
|
|
.IP \(bu 2
|
|
> 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the
|
|
packet is not forwarded or needs assist from full stack
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Add an entry to, or update a sockhash \fImap\fP referencing sockets.
|
|
The \fIskops\fP is used as a new value for the entry associated to
|
|
\fIkey\fP\&. \fIflags\fP is one of:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_NOEXIST\fP
|
|
The entry for \fIkey\fP must not exist in the map.
|
|
.TP
|
|
.B \fBBPF_EXIST\fP
|
|
The entry for \fIkey\fP must already exist in the map.
|
|
.TP
|
|
.B \fBBPF_ANY\fP
|
|
No condition on the existence of the entry for \fIkey\fP\&.
|
|
.UNINDENT
|
|
.sp
|
|
If the \fImap\fP has eBPF programs (parser and verdict), those will
|
|
be inherited by the socket being added. If the socket is
|
|
already attached to eBPF programs, this results in an error.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is used in programs implementing policies at the
|
|
socket level. If the message \fImsg\fP is allowed to pass (i.e. if
|
|
the verdict eBPF program returns \fBSK_PASS\fP), redirect it to
|
|
the socket referenced by \fImap\fP (of type
|
|
\fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and
|
|
egress interfaces can be used for redirection. The
|
|
\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
|
|
distinction (ingress path is selected if the flag is present,
|
|
egress path otherwise). This is the only flag supported for now.
|
|
.TP
|
|
.B Return
|
|
\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is used in programs implementing policies at the
|
|
skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e.
|
|
if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it
|
|
to the socket referenced by \fImap\fP (of type
|
|
\fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and
|
|
egress interfaces can be used for redirection. The
|
|
\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
|
|
distinction (ingress path is selected if the flag is present,
|
|
egress otherwise). This is the only flag supported for now.
|
|
.TP
|
|
.B Return
|
|
\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Encapsulate the packet associated to \fIskb\fP within a Layer 3
|
|
protocol header. This header is provided in the buffer at
|
|
address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates
|
|
the protocol of the header and can be one of:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBBPF_LWT_ENCAP_SEG6\fP
|
|
IPv6 encapsulation with Segment Routing Header
|
|
(\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH,
|
|
the IPv6 header is computed by the kernel.
|
|
.TP
|
|
.B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP
|
|
Only works if \fIskb\fP contains an IPv6 packet. Insert a
|
|
Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside
|
|
the IPv6 header.
|
|
.UNINDENT
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Store \fIlen\fP bytes from address \fIfrom\fP into the packet
|
|
associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs
|
|
inside the outermost IPv6 Segment Routing Header can be
|
|
modified through this helper.
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Adjust the size allocated to TLVs in the outermost IPv6
|
|
Segment Routing Header contained in the packet associated to
|
|
\fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets
|
|
after the segments are accepted. \fIdelta\fP can be as well
|
|
positive (growing) as negative (shrinking).
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Apply an IPv6 Segment Routing action of type \fIaction\fP to the
|
|
packet associated to \fIskb\fP\&. Each action takes a parameter
|
|
contained at address \fIparam\fP, and of length \fIparam_len\fP bytes.
|
|
\fIaction\fP can be one of:
|
|
.INDENT 7.0
|
|
.TP
|
|
.B \fBSEG6_LOCAL_ACTION_END_X\fP
|
|
End.X action: Endpoint with Layer\-3 cross\-connect.
|
|
Type of \fIparam\fP: \fBstruct in6_addr\fP\&.
|
|
.TP
|
|
.B \fBSEG6_LOCAL_ACTION_END_T\fP
|
|
End.T action: Endpoint with specific IPv6 table lookup.
|
|
Type of \fIparam\fP: \fBint\fP\&.
|
|
.TP
|
|
.B \fBSEG6_LOCAL_ACTION_END_B6\fP
|
|
End.B6 action: Endpoint bound to an SRv6 policy.
|
|
Type of param: \fBstruct ipv6_sr_hdr\fP\&.
|
|
.TP
|
|
.B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP
|
|
End.B6.Encap action: Endpoint bound to an SRv6
|
|
encapsulation policy.
|
|
Type of param: \fBstruct ipv6_sr_hdr\fP\&.
|
|
.UNINDENT
|
|
.sp
|
|
A call to this helper is susceptible to change the underlaying
|
|
packet buffer. Therefore, at load time, all checks on pointers
|
|
previously done by the verifier are invalidated and must be
|
|
performed again, if the helper is used in combination with
|
|
direct packet access.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is used in programs implementing IR decoding, to
|
|
report a successfully decoded key press with \fIscancode\fP,
|
|
\fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be
|
|
translated to a keycode using the rc keymap, and reported as
|
|
an input key down event. After a period a key up event is
|
|
generated. This period can be extended by calling either
|
|
\fBbpf_rc_keydown\fP () again with the same values, or calling
|
|
\fBbpf_rc_repeat\fP ().
|
|
.sp
|
|
Some protocols include a toggle bit, in case the button was
|
|
released and pressed again between consecutive scancodes.
|
|
.sp
|
|
The \fIctx\fP should point to the lirc sample as passed into
|
|
the program.
|
|
.sp
|
|
The \fIprotocol\fP is the decoded protocol number (see
|
|
\fBenum rc_proto\fP for some predefined values).
|
|
.sp
|
|
This helper is only available is the kernel was compiled with
|
|
the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
|
|
"\fBy\fP".
|
|
.TP
|
|
.B Return
|
|
0
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
This helper is used in programs implementing IR decoding, to
|
|
report a successfully decoded repeat key message. This delays
|
|
the generation of a key up event for previously generated
|
|
key down event.
|
|
.sp
|
|
Some IR protocols like NEC have a special IR message for
|
|
repeating last button, for when a button is held down.
|
|
.sp
|
|
The \fIctx\fP should point to the lirc sample as passed into
|
|
the program.
|
|
.sp
|
|
This helper is only available is the kernel was compiled with
|
|
the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
|
|
"\fBy\fP".
|
|
.TP
|
|
.B Return
|
|
0
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBuint64_t bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Return the cgroup v2 id of the socket associated with the \fIskb\fP\&.
|
|
This is roughly similar to the \fBbpf_get_cgroup_classid\fP()
|
|
helper for cgroup v1 by providing a tag resp. identifier that
|
|
can be matched on or used for map lookups e.g. to implement
|
|
policy. The cgroup v2 id of a given path in the hierarchy is
|
|
exposed in user space through the f_handle API in order to get
|
|
to the same 64\-bit id.
|
|
.sp
|
|
This helper can be used on TC egress path, but not on ingress,
|
|
and is available only if the kernel was compiled with the
|
|
\fBCONFIG_SOCK_CGROUP_DATA\fP configuration option.
|
|
.TP
|
|
.B Return
|
|
The id is returned or 0 in case the id could not be retrieved.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Return id of cgroup v2 that is ancestor of cgroup associated
|
|
with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at
|
|
\fIancestor_level\fP zero and each step down the hierarchy
|
|
increments the level. If \fIancestor_level\fP == level of cgroup
|
|
associated with \fIskb\fP, then return value will be same as that
|
|
of \fBbpf_skb_cgroup_id\fP().
|
|
.sp
|
|
The helper is useful to implement policies based on cgroups
|
|
that are upper in hierarchy than immediate cgroup associated
|
|
with \fIskb\fP\&.
|
|
.sp
|
|
The format of returned id and helper limitations are same as in
|
|
\fBbpf_skb_cgroup_id\fP().
|
|
.TP
|
|
.B Return
|
|
The id is returned or 0 in case the id could not be retrieved.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBu64 bpf_get_current_cgroup_id(void)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Return
|
|
A 64\-bit integer containing the current cgroup id based
|
|
on the cgroup within which the current task is running.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBvoid* get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Get the pointer to the local storage area.
|
|
The type and the size of the local storage is defined
|
|
by the \fImap\fP argument.
|
|
The \fIflags\fP meaning is specific for each map type,
|
|
and has to be 0 for cgroup local storage.
|
|
.sp
|
|
Depending on the bpf program type, a local storage area
|
|
can be shared between multiple instances of the bpf program,
|
|
running simultaneously.
|
|
.sp
|
|
A user should care about the synchronization by himself.
|
|
For example, by using the BPF_STX_XADD instruction to alter
|
|
the shared data.
|
|
.TP
|
|
.B Return
|
|
Pointer to the local storage area.
|
|
.UNINDENT
|
|
.TP
|
|
.B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
|
|
.INDENT 7.0
|
|
.TP
|
|
.B Description
|
|
Select a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY map
|
|
It checks the selected sk is matching the incoming
|
|
request in the skb.
|
|
.TP
|
|
.B Return
|
|
0 on success, or a negative error in case of failure.
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.SH EXAMPLES
|
|
.sp
|
|
Example usage for most of the eBPF helpers listed in this manual page are
|
|
available within the Linux kernel sources, at the following locations:
|
|
.INDENT 0.0
|
|
.IP \(bu 2
|
|
\fIsamples/bpf/\fP
|
|
.IP \(bu 2
|
|
\fItools/testing/selftests/bpf/\fP
|
|
.UNINDENT
|
|
.SH LICENSE
|
|
.sp
|
|
eBPF programs can have an associated license, passed along with the bytecode
|
|
instructions to the kernel when the programs are loaded. The format for that
|
|
string is identical to the one in use for kernel modules (Dual licenses, such
|
|
as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
|
|
programs that are compatible with the GNU Privacy License (GPL).
|
|
.sp
|
|
In order to use such helpers, the eBPF program must be loaded with the correct
|
|
license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this
|
|
generally translates into the C source code of the program containing a line
|
|
similar to the following:
|
|
.INDENT 0.0
|
|
.INDENT 3.5
|
|
.sp
|
|
.nf
|
|
.ft C
|
|
char ____license[] __attribute__((section("license"), used)) = "GPL";
|
|
.ft P
|
|
.fi
|
|
.UNINDENT
|
|
.UNINDENT
|
|
.SH IMPLEMENTATION
|
|
.sp
|
|
This manual page is an effort to document the existing eBPF helper functions.
|
|
But as of this writing, the BPF sub\-system is under heavy development. New eBPF
|
|
program or map types are added, along with new helper functions. Some helpers
|
|
are occasionally made available for additional program types. So in spite of
|
|
the efforts of the community, this page might not be up\-to\-date. If you want to
|
|
check by yourself what helper functions exist in your kernel, or what types of
|
|
programs they can support, here are some files among the kernel tree that you
|
|
may be interested in:
|
|
.INDENT 0.0
|
|
.IP \(bu 2
|
|
\fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list
|
|
of all helper functions, as well as many other BPF definitions including most
|
|
of the flags, structs or constants used by the helpers.
|
|
.IP \(bu 2
|
|
\fInet/core/filter.c\fP contains the definition of most network\-related helper
|
|
functions, and the list of program types from which they can be used.
|
|
.IP \(bu 2
|
|
\fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related
|
|
helpers.
|
|
.IP \(bu 2
|
|
\fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types
|
|
of eBPF maps are used with a given helper function.
|
|
.IP \(bu 2
|
|
\fIkernel/bpf/\fP directory contains other files in which additional helpers are
|
|
defined (for cgroups, sockmaps, etc.).
|
|
.UNINDENT
|
|
.sp
|
|
Compatibility between helper functions and program types can generally be found
|
|
in the files where helper functions are defined. Look for the \fBstruct
|
|
bpf_func_proto\fP objects and for functions returning them: these functions
|
|
contain a list of helpers that a given program type can call. Note that the
|
|
\fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call
|
|
other functions, themselves allowing access to additional helpers. The
|
|
requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&.
|
|
.sp
|
|
Compatibility between helper functions and map types can be found in the
|
|
\fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&.
|
|
.sp
|
|
Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP
|
|
pointers for network processing are listed in function
|
|
\fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&.
|
|
.SH SEE ALSO
|
|
.sp
|
|
\fBbpf\fP(2),
|
|
\fBcgroups\fP(7),
|
|
\fBip\fP(8),
|
|
\fBperf_event_open\fP(2),
|
|
\fBsendmsg\fP(2),
|
|
\fBsocket\fP(7),
|
|
\fBtc\-bpf\fP(8)
|
|
.\" Generated by docutils manpage writer.
|
|
.
|