2015-03-13 19:16:32 +00:00
|
|
|
.\" Copyright (C) 2015 Alexei Starovoitov <ast@kernel.org>
|
2015-07-22 14:45:08 +00:00
|
|
|
.\" and Copyright (C) 2015 Michael Kerrisk <mtk.manpages@gmail.com>
|
2015-03-13 19:16:32 +00:00
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
intro.1, locale.1, _exit.2, access.2, bpf.2, brk.2, capget.2, chmod.2, chroot.2, clock_getres.2, clone.2, eventfd.2, fallocate.2, fork.2, getgroups.2, gethostname.2, getpid.2, getpriority.2, getrlimit.2, getrusage.2, gettid.2, iopl.2, ioprio_set.2, killpg.2, mlock.2, mprotect.2, perf_event_open.2, poll.2, posix_fadvise.2, pread.2, ptrace.2, read.2, readv.2, recv.2, rename.2, sched_setaffinity.2, sched_setattr.2, seccomp.2, select.2, send.2, seteuid.2, setgid.2, setresuid.2, setreuid.2, setuid.2, sigaltstack.2, signalfd.2, sigpending.2, sigprocmask.2, sigreturn.2, sigsuspend.2, sigwaitinfo.2, socket.2, stat.2, timer_create.2, uname.2, utimensat.2, wait.2, wait4.2, write.2, MB_LEN_MAX.3, __ppc_get_timebase.3, clearenv.3, dl_iterate_phdr.3, error.3, fexecve.3, fpurge.3, fread.3, fts.3, getaddrinfo.3, getaddrinfo_a.3, getauxval.3, getgrent_r.3, gethostbyname.3, getifaddrs.3, getnameinfo.3, getnetent_r.3, getprotoent.3, getprotoent_r.3, getpw.3, getpwent_r.3, getrpcent.3, getrpcent_r.3, getrpcport.3, getservent.3, getservent_r.3, gsignal.3, key_setsecret.3, malloc_get_state.3, malloc_info.3, malloc_stats.3, malloc_trim.3, memcpy.3, mq_notify.3, mq_open.3, perror.3, profil.3, psignal.3, pthread_attr_init.3, pthread_attr_setaffinity_np.3, pthread_cancel.3, pthread_cleanup_push.3, pthread_create.3, pthread_detach.3, pthread_getattr_np.3, pthread_join.3, pthread_setname_np.3, pthread_tryjoin_np.3, putgrent.3, rcmd.3, rpc.3, rpmatch.3, sem_close.3, sem_open.3, setaliasent.3, shm_open.3, sigqueue.3, strfmon.3, xcrypt.3, xdr.3, console_codes.4, null.4, core.5, host.conf.5, hosts.equiv.5, locale.5, repertoiremap.5, locale.7, man-pages.7, pty.7, rtld-audit.7, sched.7, vdso.7: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2015-07-23 14:12:28 +00:00
|
|
|
.TH BPF 2 2015-07-23 "Linux" "Linux Programmer's Manual"
|
2015-03-13 19:16:32 +00:00
|
|
|
.SH NAME
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf - perform a command on an extended BPF map or program
|
2015-03-13 19:16:32 +00:00
|
|
|
.SH SYNOPSIS
|
|
|
|
.nf
|
|
|
|
.B #include <linux/bpf.h>
|
|
|
|
.sp
|
|
|
|
.BI "int bpf(int cmd, union bpf_attr *attr, unsigned int size);
|
|
|
|
.SH DESCRIPTION
|
2015-05-24 19:47:35 +00:00
|
|
|
The
|
2015-05-25 17:49:29 +00:00
|
|
|
.BR bpf ()
|
2015-05-26 10:56:34 +00:00
|
|
|
system call performs a range of operations related to extended
|
|
|
|
Berkeley Packet Filters.
|
|
|
|
Extended BPF (or eBPF) is similar to
|
2015-06-04 11:11:19 +00:00
|
|
|
the original ("classic") BPF (cBPF) used to filter network packets.
|
|
|
|
For both cBPF and eBPF programs,
|
2015-05-26 10:56:34 +00:00
|
|
|
the kernel statically analyzes the programs before loading them,
|
|
|
|
in order to ensure that they cannot harm the running system.
|
2015-03-13 19:16:32 +00:00
|
|
|
.P
|
2015-06-07 13:33:34 +00:00
|
|
|
eBPF extends cBPF in multiple ways, including the ability to call
|
2015-07-22 12:53:08 +00:00
|
|
|
a fixed set of in-kernel helper functions
|
|
|
|
.\" See 'enum bpf_func_id' in include/uapi/linux/bpf.h
|
|
|
|
(via the
|
2015-05-26 10:56:34 +00:00
|
|
|
.B BPF_CALL
|
|
|
|
opcode extension provided by eBPF)
|
2015-07-22 14:45:08 +00:00
|
|
|
and access shared data structures such as eBPF maps.
|
2015-07-23 13:36:42 +00:00
|
|
|
.\"
|
2015-03-13 19:16:32 +00:00
|
|
|
.SS Extended BPF Design/Architecture
|
2015-07-22 14:45:08 +00:00
|
|
|
.\"
|
2015-05-26 10:56:34 +00:00
|
|
|
.\" FIXME In the following line, what does "different data types" mean?
|
|
|
|
.\" Are the values in a map not just blobs?
|
2015-07-23 10:11:21 +00:00
|
|
|
.\" Daniel Borkmann commented:
|
2015-07-22 14:45:08 +00:00
|
|
|
.\" Sort of, currently, these blobs can have different sizes of keys
|
|
|
|
.\" and values (you can even have structs as keys). For the map itself
|
|
|
|
.\" they are treated as blob internally. However, recently, bpf tail call
|
|
|
|
.\" got added where you can lookup another program from an array map and
|
|
|
|
.\" call into it. Here, that particular type of map can only have entries
|
|
|
|
.\" of type of eBPF program fd. I think, if needed, adding a paragraph to
|
|
|
|
.\" the tail call could be done as follow-up after we have an initial man
|
|
|
|
.\" page in the tree included.
|
|
|
|
.\"
|
2015-07-23 10:11:21 +00:00
|
|
|
eBPF maps are a generic data structure for storage of different data types.
|
2015-03-13 19:16:32 +00:00
|
|
|
A user process can create multiple maps (with key/value-pairs being
|
2015-05-25 17:49:29 +00:00
|
|
|
opaque bytes of data) and access them via file descriptors.
|
2015-07-23 09:56:44 +00:00
|
|
|
Different eBPF programs can access the same maps in parallel.
|
2015-06-04 11:11:19 +00:00
|
|
|
It's up to the user process and eBPF program to decide what they store
|
2015-03-13 19:16:32 +00:00
|
|
|
inside maps.
|
|
|
|
.P
|
2015-07-23 10:11:21 +00:00
|
|
|
eBPF programs are loaded by the user
|
2015-03-13 19:16:32 +00:00
|
|
|
process and automatically unloaded when the process exits.
|
2015-07-22 14:45:08 +00:00
|
|
|
.\"
|
|
|
|
.\" FIXME Daniel Borkmann commented about the preceding sentence:
|
|
|
|
.\"
|
|
|
|
.\" Generally that's true. Btw, in 4.1 kernel, tc(8) also got support for
|
|
|
|
.\" eBPF classifier and actions, and here it's slightly different: in tc,
|
|
|
|
.\" we load the programs, maps etc, and push down the eBPF program fd in
|
|
|
|
.\" order to let the kernel hold reference on the program itself.
|
2015-07-23 10:11:21 +00:00
|
|
|
.\"
|
2015-07-22 14:45:08 +00:00
|
|
|
.\" Thus, there, the program fd that the application owns is gone when the
|
|
|
|
.\" application terminates, but the eBPF program itself still lives on
|
|
|
|
.\" inside the kernel.
|
|
|
|
.\"
|
|
|
|
.\" Probably something should be said about this in this man page.
|
|
|
|
.\"
|
2015-06-04 11:11:19 +00:00
|
|
|
Each program is a set of instructions that is safe to run until
|
2015-05-24 10:00:38 +00:00
|
|
|
its completion.
|
2015-06-04 11:11:19 +00:00
|
|
|
An in-kernel verifier statically determines that the eBPF program
|
2015-05-24 10:00:38 +00:00
|
|
|
terminates and is safe to execute.
|
2015-06-04 10:48:35 +00:00
|
|
|
During verification, the kernel increments reference counts for each of
|
|
|
|
the maps that the eBPF program uses,
|
2015-07-23 10:11:21 +00:00
|
|
|
so that the attached maps can't be removed until the program is unloaded.
|
2015-05-26 10:56:34 +00:00
|
|
|
|
2015-06-04 11:11:19 +00:00
|
|
|
eBPF programs can be attached to different events.
|
2015-06-07 15:00:20 +00:00
|
|
|
These events can be the arrival of network packets, tracing
|
2015-07-23 10:11:21 +00:00
|
|
|
events, classification events by network queueing disciplines
|
|
|
|
(for eBPF programs attached to a
|
2015-06-07 15:00:20 +00:00
|
|
|
.BR tc (8)
|
|
|
|
classifier), and other types that may be added in the future.
|
2015-06-04 11:11:19 +00:00
|
|
|
A new event triggers execution of the eBPF program, which
|
2015-07-22 12:53:08 +00:00
|
|
|
may store information about the event in eBPF maps.
|
2015-06-04 11:11:19 +00:00
|
|
|
Beyond storing data, eBPF programs may call a fixed set of
|
2015-06-04 10:48:35 +00:00
|
|
|
in-kernel helper functions.
|
2015-07-22 12:53:08 +00:00
|
|
|
The same eBPF program can be attached to multiple events and different
|
2015-06-07 13:33:34 +00:00
|
|
|
eBPF programs can access the same map:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-07-23 10:11:21 +00:00
|
|
|
tracing tracing tracing packet packet
|
|
|
|
event A event B event C on eth0 on eth1
|
|
|
|
| | | | |
|
|
|
|
| | | | |
|
|
|
|
--> tracing <-- tracing socket tc ingress
|
|
|
|
prog_1 prog_2 prog_3 classifier
|
|
|
|
| | | | prog_4
|
2015-05-24 19:31:01 +00:00
|
|
|
|--- -----| |-------| map_3
|
|
|
|
map_1 map_2
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
2015-07-23 13:36:42 +00:00
|
|
|
.\"
|
2015-05-24 19:47:35 +00:00
|
|
|
.SS Arguments
|
2015-05-26 10:56:34 +00:00
|
|
|
The operation to be performed by the
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf ()
|
2015-05-26 10:56:34 +00:00
|
|
|
system call is determined by the
|
2015-03-13 19:16:32 +00:00
|
|
|
.IR cmd
|
2015-07-22 12:53:08 +00:00
|
|
|
argument.
|
|
|
|
Each operation takes an accompanying argument,
|
|
|
|
provided via
|
|
|
|
.IR attr ,
|
|
|
|
which is a pointer to a union of type
|
|
|
|
.IR bpf_attr
|
|
|
|
(see below).
|
|
|
|
The
|
|
|
|
.I size
|
|
|
|
argument is the size of the union pointed to by
|
|
|
|
.IR attr .
|
|
|
|
|
|
|
|
The value provided in
|
|
|
|
.IR cmd
|
|
|
|
is one of the following:
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_CREATE
|
2015-07-23 10:11:21 +00:00
|
|
|
Create a map and return a file descriptor that refers to the map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
2015-05-26 10:56:34 +00:00
|
|
|
Look up an element by key in a specified map and return its value.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_UPDATE_ELEM
|
2015-05-26 10:56:34 +00:00
|
|
|
Create or update an element (key/value pair) in a specified map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_DELETE_ELEM
|
2015-05-26 10:56:34 +00:00
|
|
|
Look up and delete an element by key in a specified map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_GET_NEXT_KEY
|
2015-05-26 10:56:34 +00:00
|
|
|
Look up an element by key in a specified map and return the key
|
|
|
|
of the next element.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_PROG_LOAD
|
2015-06-07 15:00:20 +00:00
|
|
|
Verify and load an eBPF program,
|
|
|
|
returning a new file descriptor associated with the program.
|
2015-03-13 19:16:32 +00:00
|
|
|
.P
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
|
|
|
.I bpf_attr
|
|
|
|
union consists of various anonymous structures that are used by different
|
|
|
|
.BR bpf ()
|
|
|
|
commands:
|
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
union bpf_attr {
|
2015-05-26 10:56:34 +00:00
|
|
|
struct { /* Used by BPF_MAP_CREATE */
|
|
|
|
__u32 map_type;
|
|
|
|
__u32 key_size; /* size of key in bytes */
|
|
|
|
__u32 value_size; /* size of value in bytes */
|
|
|
|
__u32 max_entries; /* maximum number of entries
|
|
|
|
in a map */
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
2015-07-22 12:53:08 +00:00
|
|
|
struct { /* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY
|
|
|
|
commands */
|
2015-05-26 10:56:34 +00:00
|
|
|
__u32 map_fd;
|
|
|
|
__aligned_u64 key;
|
2015-03-13 19:16:32 +00:00
|
|
|
union {
|
|
|
|
__aligned_u64 value;
|
|
|
|
__aligned_u64 next_key;
|
|
|
|
};
|
2015-05-26 10:56:34 +00:00
|
|
|
__u64 flags;
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
struct { /* Used by BPF_PROG_LOAD */
|
|
|
|
__u32 prog_type;
|
|
|
|
__u32 insn_cnt;
|
|
|
|
__aligned_u64 insns; /* 'const struct bpf_insn *' */
|
|
|
|
__aligned_u64 license; /* 'const char *' */
|
|
|
|
__u32 log_level; /* verbosity level of verifier */
|
|
|
|
__u32 log_size; /* size of user buffer */
|
|
|
|
__aligned_u64 log_buf; /* user supplied 'char *'
|
|
|
|
buffer */
|
2015-07-22 12:53:08 +00:00
|
|
|
__u32 kern_version;
|
2015-06-07 15:00:20 +00:00
|
|
|
/* checked when prog_type=kprobe
|
|
|
|
(since Linux 4.1) */
|
|
|
|
.\" commit 2541517c32be2531e0da59dfd7efc1ce844644f5
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
} __attribute__((aligned(8)));
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-07-23 13:36:42 +00:00
|
|
|
.\"
|
2015-07-22 14:45:08 +00:00
|
|
|
.SS eBPF maps
|
2015-07-22 15:58:46 +00:00
|
|
|
Maps are a generic data structure for storage of different types of data.
|
|
|
|
They allow sharing of data between eBPF kernel programs,
|
|
|
|
and also between kernel and user-space applications.
|
2015-05-25 17:49:29 +00:00
|
|
|
|
|
|
|
Each map type has the following attributes:
|
|
|
|
|
|
|
|
.PD 0
|
|
|
|
.IP * 3
|
|
|
|
type
|
|
|
|
.IP *
|
2015-05-26 06:34:12 +00:00
|
|
|
maximum number of elements
|
2015-05-25 17:49:29 +00:00
|
|
|
.IP *
|
|
|
|
key size in bytes
|
|
|
|
.IP *
|
|
|
|
value size in bytes
|
|
|
|
.PD
|
|
|
|
.PP
|
2015-05-26 10:56:34 +00:00
|
|
|
The following wrapper functions demonstrate how various
|
|
|
|
.BR bpf ()
|
|
|
|
commands can be used to access the maps.
|
2015-05-24 10:00:38 +00:00
|
|
|
The functions use the
|
2015-03-13 19:16:32 +00:00
|
|
|
.IR cmd
|
|
|
|
argument to invoke different operations.
|
2015-07-22 14:45:08 +00:00
|
|
|
.TP
|
2015-05-26 10:56:34 +00:00
|
|
|
.B BPF_MAP_CREATE
|
|
|
|
The
|
2015-03-13 19:16:32 +00:00
|
|
|
.B BPF_MAP_CREATE
|
2015-06-09 11:13:13 +00:00
|
|
|
command creates a new map,
|
|
|
|
returning a new file descriptor that refers to the map.
|
2015-05-26 10:56:34 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-26 10:56:34 +00:00
|
|
|
int
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf_create_map(enum bpf_map_type map_type,
|
|
|
|
unsigned int key_size,
|
|
|
|
unsigned int value_size,
|
|
|
|
unsigned int max_entries)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
2015-07-23 10:11:21 +00:00
|
|
|
.map_type = map_type,
|
|
|
|
.key_size = key_size,
|
|
|
|
.value_size = value_size,
|
2015-03-13 19:16:32 +00:00
|
|
|
.max_entries = max_entries
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
The new map has the type specified by
|
|
|
|
.IR map_type ,
|
|
|
|
and attributes as specified in
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR key_size ,
|
|
|
|
.IR value_size ,
|
2015-05-26 10:56:34 +00:00
|
|
|
and
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR max_entries .
|
2015-07-22 20:02:27 +00:00
|
|
|
On success, this operation returns a file descriptor.
|
2015-05-24 10:00:38 +00:00
|
|
|
On error, \-1 is returned and
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
is set to
|
|
|
|
.BR EINVAL ,
|
|
|
|
.BR EPERM ,
|
|
|
|
or
|
|
|
|
.BR ENOMEM .
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-07-23 10:11:21 +00:00
|
|
|
The
|
2015-03-13 19:16:32 +00:00
|
|
|
.I key_size
|
|
|
|
and
|
|
|
|
.I value_size
|
2015-07-23 10:11:21 +00:00
|
|
|
attributes will be used by the verifier during program loading
|
|
|
|
to check that the program is calling
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf_map_*_elem ()
|
|
|
|
helper functions with a correctly initialized
|
2015-03-13 19:16:32 +00:00
|
|
|
.I key
|
2015-07-22 12:53:08 +00:00
|
|
|
and to check that the program doesn't access the map element
|
2015-03-13 19:16:32 +00:00
|
|
|
.I value
|
|
|
|
beyond the specified
|
2015-05-25 17:49:29 +00:00
|
|
|
.IR value_size .
|
2015-05-26 10:56:34 +00:00
|
|
|
For example, when a map is created with a
|
|
|
|
.IR key_size
|
2015-07-22 12:53:08 +00:00
|
|
|
of 8 and the eBPF program calls
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
bpf_map_lookup_elem(map_fd, fp - 4)
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
the program will be rejected,
|
2015-05-24 19:31:01 +00:00
|
|
|
since the in-kernel helper function
|
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
bpf_map_lookup_elem(map_fd, void *key)
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-07-22 20:02:27 +00:00
|
|
|
expects to read 8 bytes from the location pointed to by
|
|
|
|
.IR key ,
|
|
|
|
but the
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR "fp\ -\ 4"
|
2015-07-22 20:02:27 +00:00
|
|
|
(where
|
|
|
|
.I fp
|
|
|
|
is the top of the stack)
|
2015-05-24 19:31:01 +00:00
|
|
|
starting address will cause out-of-bounds stack access.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
Similarly, when a map is created with a
|
|
|
|
.I value_size
|
2015-07-22 12:53:08 +00:00
|
|
|
of 1 and the eBPF program contains
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
value = bpf_map_lookup_elem(...);
|
2015-05-24 19:31:01 +00:00
|
|
|
*(u32 *) value = 1;
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
the program will be rejected, since it accesses the
|
|
|
|
.I value
|
2015-05-24 19:31:01 +00:00
|
|
|
pointer beyond the specified 1 byte
|
|
|
|
.I value_size
|
|
|
|
limit.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-07-22 12:53:08 +00:00
|
|
|
Currently, the following values are supported for
|
|
|
|
.IR map_type :
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
enum bpf_map_type {
|
2015-07-22 14:45:08 +00:00
|
|
|
BPF_MAP_TYPE_UNSPEC, /* Reserve 0 as invalid map type */
|
2015-05-26 10:56:34 +00:00
|
|
|
BPF_MAP_TYPE_HASH,
|
|
|
|
BPF_MAP_TYPE_ARRAY,
|
2015-06-09 11:13:13 +00:00
|
|
|
BPF_MAP_TYPE_PROG_ARRAY,
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
.I map_type
|
2015-05-24 10:00:38 +00:00
|
|
|
selects one of the available map implementations in the kernel.
|
2015-07-22 12:53:08 +00:00
|
|
|
.\" FIXME We need an explanation of why one might choose each of
|
|
|
|
.\" these map implementations
|
2015-05-25 17:49:29 +00:00
|
|
|
For all map types,
|
2015-07-22 12:53:08 +00:00
|
|
|
eBPF programs access maps with the same
|
|
|
|
.BR bpf_map_lookup_elem ()
|
|
|
|
and
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf_map_update_elem ()
|
2015-03-13 19:16:32 +00:00
|
|
|
helper functions.
|
2015-07-22 14:45:08 +00:00
|
|
|
Further details of the various map types are given below.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
|
|
|
command looks up an element with a given
|
|
|
|
.I key
|
|
|
|
in the map referred to by the file descriptor
|
|
|
|
.IR fd .
|
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-26 10:56:34 +00:00
|
|
|
int
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf_lookup_elem(int fd, const void *key, void *value)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
2015-07-23 10:11:21 +00:00
|
|
|
.key = ptr_to_u64(key),
|
|
|
|
.value = ptr_to_u64(value),
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
If an element is found,
|
|
|
|
the operation returns zero and stores the element's value into
|
2015-06-09 11:13:13 +00:00
|
|
|
.IR value ,
|
|
|
|
which must point to a buffer of
|
|
|
|
.I value_size
|
|
|
|
bytes.
|
2015-05-26 10:56:34 +00:00
|
|
|
|
|
|
|
If no element is found, the operation returns \-1 and sets
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
to
|
|
|
|
.BR ENOENT .
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_UPDATE_ELEM
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
|
|
|
.B BPF_MAP_UPDATE_ELEM
|
|
|
|
command
|
|
|
|
creates or updates an element with a given
|
|
|
|
.I key/value
|
|
|
|
in the map referred to by the file descriptor
|
|
|
|
.IR fd .
|
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-26 10:56:34 +00:00
|
|
|
int
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf_update_elem(int fd, const void *key, const void *value,
|
|
|
|
uint64_t flags)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
2015-07-23 10:11:21 +00:00
|
|
|
.key = ptr_to_u64(key),
|
|
|
|
.value = ptr_to_u64(value),
|
|
|
|
.flags = flags,
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
2015-03-13 19:16:32 +00:00
|
|
|
.I flags
|
2015-05-26 10:56:34 +00:00
|
|
|
argument should be specified as one of the following:
|
|
|
|
.RS
|
|
|
|
.TP
|
|
|
|
.B BPF_ANY
|
|
|
|
Create a new element or update an existing element.
|
|
|
|
.TP
|
|
|
|
.B BPF_NOEXIST
|
|
|
|
Create a new element only if it did not exist.
|
|
|
|
.TP
|
|
|
|
.B BPF_EXIST
|
|
|
|
Update an existing element.
|
|
|
|
.RE
|
|
|
|
.IP
|
|
|
|
On success, the operation returns zero.
|
2015-03-13 19:16:32 +00:00
|
|
|
On error, \-1 is returned and
|
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
is set to
|
|
|
|
.BR EINVAL ,
|
|
|
|
.BR EPERM ,
|
|
|
|
.BR ENOMEM ,
|
|
|
|
or
|
|
|
|
.BR E2BIG .
|
2015-03-13 19:16:32 +00:00
|
|
|
.B E2BIG
|
2015-05-26 10:56:34 +00:00
|
|
|
indicates that the number of elements in the map reached the
|
2015-03-13 19:16:32 +00:00
|
|
|
.I max_entries
|
|
|
|
limit specified at map creation time.
|
|
|
|
.B EEXIST
|
2015-05-26 10:56:34 +00:00
|
|
|
will be returned if
|
|
|
|
.I flags
|
|
|
|
specifies
|
|
|
|
.B BPF_NOEXIST
|
|
|
|
and the element with
|
2015-05-24 19:31:01 +00:00
|
|
|
.I key
|
|
|
|
already exists in the map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.B ENOENT
|
2015-07-23 10:11:21 +00:00
|
|
|
will be returned if
|
2015-05-26 10:56:34 +00:00
|
|
|
.I flags
|
|
|
|
specifies
|
|
|
|
.B BPF_EXIST
|
|
|
|
and the element with
|
2015-05-24 19:31:01 +00:00
|
|
|
.I key
|
|
|
|
doesn't exist in the map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_DELETE_ELEM
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
|
|
|
.B BPF_MAP_DELETE_ELEM
|
|
|
|
command
|
|
|
|
deleted the element whose key is
|
|
|
|
.I key
|
|
|
|
from the map referred to by the file descriptor
|
|
|
|
.IR fd .
|
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-26 10:56:34 +00:00
|
|
|
int
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf_delete_elem(int fd, const void *key)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
2015-07-23 10:11:21 +00:00
|
|
|
.key = ptr_to_u64(key),
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
On success, zero is returned.
|
|
|
|
If the element is not found, \-1 is returned and
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-26 10:56:34 +00:00
|
|
|
is set to
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR ENOENT .
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_GET_NEXT_KEY
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
|
|
|
.B BPF_MAP_GET_NEXT_KEY
|
|
|
|
command looks up an element by
|
|
|
|
.I key
|
|
|
|
in the map referred to by the file descriptor
|
2015-07-23 10:11:21 +00:00
|
|
|
.IR fd
|
2015-05-26 10:56:34 +00:00
|
|
|
and sets the
|
|
|
|
.I next_key
|
|
|
|
pointer to the key of the next element.
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-26 10:56:34 +00:00
|
|
|
.in +4n
|
|
|
|
int
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf_get_next_key(int fd, const void *key, void *next_key)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
2015-07-23 10:11:21 +00:00
|
|
|
.map_fd = fd,
|
|
|
|
.key = ptr_to_u64(key),
|
2015-03-13 19:16:32 +00:00
|
|
|
.next_key = ptr_to_u64(next_key),
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-06-09 11:13:13 +00:00
|
|
|
If
|
|
|
|
.I key
|
|
|
|
is found, the operation returns zero and sets the
|
|
|
|
.I next_key
|
|
|
|
pointer to the key of the next element.
|
2015-03-13 19:16:32 +00:00
|
|
|
If
|
|
|
|
.I key
|
2015-05-26 10:56:34 +00:00
|
|
|
is not found, the operation returns zero and sets the
|
2015-03-13 19:16:32 +00:00
|
|
|
.I next_key
|
|
|
|
pointer to the key of the first element.
|
|
|
|
If
|
|
|
|
.I key
|
2015-05-26 10:56:34 +00:00
|
|
|
is the last element, \-1 is returned and
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-26 10:56:34 +00:00
|
|
|
is set to
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR ENOENT .
|
2015-05-24 10:00:38 +00:00
|
|
|
Other possible
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
values are
|
|
|
|
.BR ENOMEM ,
|
|
|
|
.BR EFAULT ,
|
|
|
|
.BR EPERM ,
|
|
|
|
and
|
|
|
|
.BR EINVAL .
|
2015-03-13 19:16:32 +00:00
|
|
|
This method can be used to iterate over all elements in the map.
|
|
|
|
.TP
|
|
|
|
.B close(map_fd)
|
2015-05-26 10:56:34 +00:00
|
|
|
Delete the map referred to by the file descriptor
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR map_fd .
|
2015-05-26 10:56:34 +00:00
|
|
|
When the user-space program that created a map exits, all maps will
|
2015-07-22 14:45:08 +00:00
|
|
|
be deleted automatically (but see NOTES).
|
|
|
|
.\"
|
|
|
|
.SS eBPF map types
|
|
|
|
The following map types are supported:
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_TYPE_HASH
|
|
|
|
.\" commit 0f8e4bd8a1fc8c4185f1630061d0a1f2d197a475
|
|
|
|
Hash-table maps have the following characteristics:
|
|
|
|
.RS
|
|
|
|
.IP * 3
|
|
|
|
Maps are created and destroyed by user-space programs.
|
|
|
|
Both user-space and eBPF programs
|
2015-07-22 20:02:27 +00:00
|
|
|
can perform lookup, update, and delete operations.
|
2015-07-22 14:45:08 +00:00
|
|
|
.IP *
|
|
|
|
The kernel takes care of allocating and freeing key/value pairs.
|
|
|
|
.IP *
|
|
|
|
The
|
|
|
|
.BR map_update_elem ()
|
|
|
|
helper with fail to insert new element when the
|
|
|
|
.I max_entries
|
|
|
|
limit is reached.
|
|
|
|
(This ensures that eBPF programs cannot exhaust memory.)
|
|
|
|
.IP *
|
|
|
|
.BR map_update_elem ()
|
|
|
|
replaces existing elements atomically.
|
|
|
|
.RE
|
|
|
|
.IP
|
2015-07-23 10:11:21 +00:00
|
|
|
Hash-table maps are
|
2015-07-22 14:45:08 +00:00
|
|
|
optimized for speed of lookup.
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_TYPE_ARRAY
|
|
|
|
.\" commit 28fbcfa08d8ed7c5a50d41a0433aad222835e8e3
|
|
|
|
Array maps have the following characteristics:
|
|
|
|
.RS
|
|
|
|
.IP * 3
|
|
|
|
Optimized for fastest possible lookup.
|
2015-07-22 20:02:27 +00:00
|
|
|
In the future the verifier/JIT compiler
|
2015-07-22 14:45:08 +00:00
|
|
|
may recognize lookup() operations that employ a constant key
|
|
|
|
and optimize it into constant pointer.
|
|
|
|
It is possible to optimize a non-constant
|
|
|
|
key into direct pointer arithmetic as well, since pointers and
|
|
|
|
.I value_size
|
|
|
|
are constant for the life of the eBPF program.
|
|
|
|
In other words,
|
|
|
|
.BR array_map_lookup_elem ()
|
|
|
|
may be 'inlined' by the verifier/JIT compiler
|
|
|
|
while preserving concurrent access to this map from user space.
|
|
|
|
.IP *
|
|
|
|
All array elements pre-allocated and zero initialized at init time
|
|
|
|
.IP *
|
|
|
|
The key is an array index, and must be exactly four bytes.
|
|
|
|
.IP *
|
|
|
|
.BR map_delete_elem ()
|
|
|
|
fails with the error
|
|
|
|
.BR EINVAL ,
|
|
|
|
since elements cannot be deleted.
|
|
|
|
.IP *
|
|
|
|
.BR map_update_elem ()
|
2015-07-23 10:11:21 +00:00
|
|
|
replaces elements in a
|
|
|
|
.B nonatomic
|
|
|
|
fashion;
|
2015-07-23 13:36:42 +00:00
|
|
|
.\" FIXME
|
2015-07-23 10:11:21 +00:00
|
|
|
.\" Daniel Borkmann: when you have a value_size of sizeof(long), you can
|
|
|
|
.\" however use __sync_fetch_and_add() atomic builtin from the LLVM backend
|
2015-07-22 14:45:08 +00:00
|
|
|
for atomic updates, a hash-table map should be used instead.
|
|
|
|
.RE
|
|
|
|
.IP
|
|
|
|
Among the uses for array maps are the following:
|
|
|
|
.RS
|
|
|
|
.IP * 3
|
|
|
|
As "global" eBPF variables: an array of 1 element whose key is (index) 0
|
|
|
|
and where the value is a collection of 'global' variables which
|
|
|
|
eBPF programs can use to keep state between events.
|
|
|
|
.IP *
|
|
|
|
Aggregation of tracing events into a fixed set of buckets.
|
|
|
|
.RE
|
|
|
|
.TP
|
|
|
|
.BR BPF_MAP_TYPE_PROG_ARRAY " (since Linux 4.2)"
|
2015-07-22 20:02:27 +00:00
|
|
|
.\" FIXME we need documentation of BPF_MAP_TYPE_PROG_ARRAY
|
2015-07-22 14:45:08 +00:00
|
|
|
[To be completed]
|
|
|
|
.\"
|
|
|
|
.SS eBPF programs
|
2015-05-26 10:56:34 +00:00
|
|
|
The
|
|
|
|
.B BPF_PROG_LOAD
|
2015-06-04 11:11:19 +00:00
|
|
|
command is used to load an eBPF program into the kernel.
|
2015-06-07 15:00:20 +00:00
|
|
|
The return value for this command is a new file descriptor associated
|
2015-07-22 14:45:08 +00:00
|
|
|
with this eBPF program.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
char bpf_log_buf[LOG_BUF_SIZE];
|
|
|
|
|
2015-05-26 10:56:34 +00:00
|
|
|
int
|
2015-07-23 10:11:21 +00:00
|
|
|
bpf_prog_load(enum bpf_prog_type type,
|
2015-05-26 10:56:34 +00:00
|
|
|
const struct bpf_insn *insns, int insn_cnt,
|
|
|
|
const char *license)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
2015-07-23 10:11:21 +00:00
|
|
|
.prog_type = type,
|
|
|
|
.insns = ptr_to_u64(insns),
|
|
|
|
.insn_cnt = insn_cnt,
|
|
|
|
.license = ptr_to_u64(license),
|
|
|
|
.log_buf = ptr_to_u64(bpf_log_buf),
|
|
|
|
.log_size = LOG_BUF_SIZE,
|
2015-03-13 19:16:32 +00:00
|
|
|
.log_level = 1,
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-26 10:56:34 +00:00
|
|
|
.in
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.I prog_type
|
2015-03-13 19:16:32 +00:00
|
|
|
is one of the available program types:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
enum bpf_prog_type {
|
2015-07-22 12:53:08 +00:00
|
|
|
BPF_PROG_TYPE_UNSPEC, /* Reserve 0 as invalid
|
|
|
|
program type */
|
2015-07-22 14:45:08 +00:00
|
|
|
BPF_PROG_TYPE_SOCKET_FILTER,
|
|
|
|
BPF_PROG_TYPE_KPROBE,
|
|
|
|
BPF_PROG_TYPE_SCHED_CLS,
|
|
|
|
BPF_PROG_TYPE_SCHED_ACT,
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
For further details of eBPF program types, see below.
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
The remaining fields of
|
2015-05-26 10:56:34 +00:00
|
|
|
.I bpf_attr
|
|
|
|
are set as follows:
|
|
|
|
.IP * 3
|
2015-05-24 19:31:01 +00:00
|
|
|
.I insns
|
2015-05-26 10:56:34 +00:00
|
|
|
is an array of
|
2015-05-24 19:31:01 +00:00
|
|
|
.I "struct bpf_insn"
|
|
|
|
instructions.
|
2015-05-26 10:56:34 +00:00
|
|
|
.IP *
|
2015-05-24 19:31:01 +00:00
|
|
|
.I insn_cnt
|
2015-05-26 10:56:34 +00:00
|
|
|
is the number of instructions in the program referred to by
|
|
|
|
.IR insns .
|
|
|
|
.IP *
|
2015-05-24 19:31:01 +00:00
|
|
|
.I license
|
2015-05-26 10:56:34 +00:00
|
|
|
is a license string, which must be GPL compatible to call helper functions
|
2015-05-24 19:31:01 +00:00
|
|
|
marked
|
|
|
|
.IR gpl_only .
|
2015-07-23 13:36:42 +00:00
|
|
|
(The licensing rules are the same as for kernel modules,
|
|
|
|
so that dual licenses, such as "Dual BSD/GPL", may be used.)
|
2015-07-23 10:11:21 +00:00
|
|
|
.\" Daniel Borkmann commented:
|
|
|
|
.\" Not strictly. So here, the same rules apply as with kernel modules.
|
|
|
|
.\" I.e. what the kernel checks for are the following license strings:
|
|
|
|
.\"
|
|
|
|
.\" static inline int license_is_gpl_compatible(const char *license)
|
|
|
|
.\" {
|
|
|
|
.\" return (strcmp(license, "GPL") == 0
|
|
|
|
.\" || strcmp(license, "GPL v2") == 0
|
|
|
|
.\" || strcmp(license, "GPL and additional rights") == 0
|
|
|
|
.\" || strcmp(license, "Dual BSD/GPL") == 0
|
|
|
|
.\" || strcmp(license, "Dual MIT/GPL") == 0
|
|
|
|
.\" || strcmp(license, "Dual MPL/GPL") == 0);
|
|
|
|
.\" }
|
2015-05-26 10:56:34 +00:00
|
|
|
.IP *
|
2015-05-24 19:31:01 +00:00
|
|
|
.I log_buf
|
2015-05-26 10:56:34 +00:00
|
|
|
is a pointer to a caller-allocated buffer in which the in-kernel
|
|
|
|
verifier can store the verification log.
|
2015-05-24 10:00:38 +00:00
|
|
|
This log is a multi-line string that can be checked by
|
2015-03-13 19:16:32 +00:00
|
|
|
the program author in order to understand how the verifier came to
|
2015-07-23 10:11:21 +00:00
|
|
|
the conclusion that the eBPF program is unsafe.
|
2015-03-13 19:16:32 +00:00
|
|
|
The format of the output can change at any time as the verifier evolves.
|
2015-05-26 10:56:34 +00:00
|
|
|
.IP *
|
2015-05-24 19:31:01 +00:00
|
|
|
.I log_size
|
2015-05-26 10:56:34 +00:00
|
|
|
size of the buffer pointed to by
|
|
|
|
.IR log_bug .
|
2015-05-24 10:00:38 +00:00
|
|
|
If the size of the buffer is not large enough to store all
|
2015-03-13 19:16:32 +00:00
|
|
|
verifier messages, \-1 is returned and
|
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
is set to
|
|
|
|
.BR ENOSPC .
|
2015-05-26 10:56:34 +00:00
|
|
|
.IP *
|
2015-05-24 19:31:01 +00:00
|
|
|
.I log_level
|
2015-05-24 10:00:38 +00:00
|
|
|
verbosity level of the verifier.
|
2015-07-23 13:36:42 +00:00
|
|
|
A value of zero means that the verifier will not provide a log;
|
|
|
|
in this case,
|
|
|
|
.I log_buf
|
|
|
|
must be a NULL pointer, and
|
|
|
|
.I log_size
|
|
|
|
must be zero.
|
2015-03-13 19:16:32 +00:00
|
|
|
.P
|
2015-07-22 14:45:08 +00:00
|
|
|
Applying
|
|
|
|
.BR close (2)
|
|
|
|
to the file descriptor returned by
|
|
|
|
.B BPF_PROG_LOAD
|
|
|
|
will unload the eBPF program (but see NOTES).
|
|
|
|
|
2015-06-04 11:11:19 +00:00
|
|
|
Maps are accessible from eBPF programs and are used to exchange data between
|
|
|
|
eBPF programs and between eBPF programs and user-space programs.
|
2015-06-09 11:13:13 +00:00
|
|
|
For example,
|
|
|
|
eBPF programs can process various events (like kprobe, packets) and
|
|
|
|
store their data into a map,
|
|
|
|
and user-space programs can then fetch data from the map.
|
|
|
|
Conversely, user-space programs can use a map as a configuration mechanism,
|
|
|
|
populating the map with values checked by the eBPF program,
|
|
|
|
which then modifies its behavior on the fly according to those values.
|
2015-07-23 10:11:21 +00:00
|
|
|
.\"
|
|
|
|
.\"
|
2015-07-22 14:45:08 +00:00
|
|
|
.SS eBPF program types
|
2015-07-23 10:11:21 +00:00
|
|
|
The eBPF program type
|
|
|
|
.RI ( prog_type )
|
2015-07-23 13:36:42 +00:00
|
|
|
determines the subset of kernel helper functions that the program
|
2015-07-23 10:11:21 +00:00
|
|
|
may call.
|
2015-07-23 13:36:42 +00:00
|
|
|
The program type also determines the program input (context)\(emthe
|
2015-07-23 10:11:21 +00:00
|
|
|
format of
|
|
|
|
.I "struct bpf_context"
|
2015-07-22 14:45:08 +00:00
|
|
|
(which is the data blob passed into the eBPF program as the first argument).
|
2015-07-23 10:11:21 +00:00
|
|
|
|
|
|
|
For example, a tracing program does not have the exact same
|
|
|
|
subset of helper functions as a socket filter program
|
|
|
|
(though they may have some helpers in common).
|
|
|
|
Similarly,
|
|
|
|
the input (context) for a tracing program is a set of register values,
|
|
|
|
while for a socket filter it is a network packet.
|
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
The set of functions available to eBPF programs of a given type may increase
|
|
|
|
in the future.
|
|
|
|
|
|
|
|
The following program types are supported:
|
|
|
|
.TP
|
|
|
|
.BR BPF_PROG_TYPE_SOCKET_FILTER " (since Linux 3.19)"
|
|
|
|
Currently, the set of functions for
|
|
|
|
.B BPF_PROG_TYPE_SOCKET_FILTER
|
|
|
|
is:
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-07-22 14:45:08 +00:00
|
|
|
bpf_map_lookup_elem(map_fd, void *key)
|
|
|
|
/* look up key in a map_fd */
|
|
|
|
bpf_map_update_elem(map_fd, void *key, void *value)
|
|
|
|
/* update key/value */
|
|
|
|
bpf_map_delete_elem(map_fd, void *key)
|
|
|
|
/* delete key in a map_fd */
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
The
|
|
|
|
.I bpf_context
|
|
|
|
argument is a pointer to a
|
2015-07-23 09:56:44 +00:00
|
|
|
.IR "struct __sk_buff" .
|
2015-07-23 10:11:21 +00:00
|
|
|
.\" FIXME: We need some text here to explain how the program
|
2015-07-23 09:56:44 +00:00
|
|
|
.\" accesses __sk_buff
|
|
|
|
.\" See 'struct __sk_buff' and commit 9bac3d6d548e5
|
|
|
|
.\" Alexei commented:
|
|
|
|
.\" Actually now in case of SOCKET_FILTER, SCHED_CLS, SCHED_ACT
|
|
|
|
.\" the program can now access skb fields.
|
2015-07-22 14:45:08 +00:00
|
|
|
.\"
|
|
|
|
.TP
|
|
|
|
.BR BPF_PROG_TYPE_KPROBE " (since Linux 4.1)
|
|
|
|
.\" commit 2541517c32be2531e0da59dfd7efc1ce844644f5
|
|
|
|
[To be documented]
|
|
|
|
.\" FIXME Document this program type
|
|
|
|
.\" Describe allowed helper functions for this program type
|
|
|
|
.\" Describe bpf_context for this program type
|
|
|
|
.\" FIXME We need text here to describe 'kern_version'
|
|
|
|
.TP
|
|
|
|
.BR BPF_PROG_TYPE_SCHED_CLS " (since Linux 4.1)
|
|
|
|
.\" commit 96be4325f443dbbfeb37d2a157675ac0736531a1
|
|
|
|
.\" commit e2e9b6541dd4b31848079da80fe2253daaafb549
|
|
|
|
[To be documented]
|
|
|
|
.\" FIXME Document this program type
|
|
|
|
.\" Describe allowed helper functions for this program type
|
|
|
|
.\" Describe bpf_context for this program type
|
|
|
|
.TP
|
|
|
|
.BR BPF_PROG_TYPE_SCHED_ACT " (since Linux 4.1)
|
|
|
|
.\" commit 94caee8c312d96522bcdae88791aaa9ebcd5f22c
|
|
|
|
.\" commit a8cb5f556b567974d75ea29c15181c445c541b1f
|
|
|
|
[To be documented]
|
|
|
|
.\" FIXME Document this program type
|
|
|
|
.\" Describe allowed helper functions for this program type
|
|
|
|
.\" Describe bpf_context for this program type
|
|
|
|
.SS Events
|
|
|
|
Once a program is loaded, it can be attached to an event.
|
|
|
|
Various kernel subsystems have different ways to do so.
|
|
|
|
|
|
|
|
Since Linux 3.19,
|
|
|
|
.\" commit 89aa075832b0da4402acebd698d0411dcc82d03e
|
|
|
|
the following call will attach the program
|
2015-03-13 19:16:32 +00:00
|
|
|
.I prog_fd
|
2015-05-26 10:56:34 +00:00
|
|
|
to the socket
|
|
|
|
.IR sockfd ,
|
2015-07-22 14:45:08 +00:00
|
|
|
which was created by an earlier call to
|
|
|
|
.BR socket (2):
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-07-22 14:45:08 +00:00
|
|
|
setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,
|
|
|
|
&prog_fd, sizeof(prog_fd));
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
Since Linux 4.1,
|
|
|
|
.\" commit 2541517c32be2531e0da59dfd7efc1ce844644f5
|
|
|
|
the following call may be used to attach
|
|
|
|
the eBPF program referred to by the file descriptor
|
2015-03-13 19:16:32 +00:00
|
|
|
.I prog_fd
|
2015-07-22 14:45:08 +00:00
|
|
|
to a perf event file descriptor,
|
|
|
|
.IR event_fd ,
|
|
|
|
that was created by a previous call to
|
|
|
|
.BR perf_event_open (2):
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
.\"
|
|
|
|
.\"
|
2015-03-13 19:16:32 +00:00
|
|
|
.SH EXAMPLES
|
|
|
|
.nf
|
|
|
|
/* bpf+sockets example:
|
|
|
|
* 1. create array map of 256 elements
|
|
|
|
* 2. load program that counts number of packets received
|
|
|
|
* r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]
|
|
|
|
* map[r0]++
|
|
|
|
* 3. attach prog_fd to raw socket via setsockopt()
|
|
|
|
* 4. print number of received TCP/UDP packets every second
|
|
|
|
*/
|
2015-05-26 10:56:34 +00:00
|
|
|
int
|
|
|
|
main(int argc, char **argv)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
int sock, map_fd, prog_fd, key;
|
|
|
|
long long value = 0, tcp_cnt, udp_cnt;
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key),
|
|
|
|
sizeof(value), 256);
|
2015-03-13 19:16:32 +00:00
|
|
|
if (map_fd < 0) {
|
|
|
|
printf("failed to create map '%s'\\n", strerror(errno));
|
|
|
|
/* likely not run as root */
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_insn prog[] = {
|
2015-05-24 19:31:01 +00:00
|
|
|
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
|
|
|
|
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
|
|
|
|
/* r0 = ip->proto */
|
|
|
|
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
|
|
|
|
/* *(u32 *)(fp - 4) = r0 */
|
|
|
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
|
|
|
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
|
|
|
|
BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
|
|
|
|
BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
|
|
|
|
/* r0 = map_lookup(r1, r2) */
|
|
|
|
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
|
|
|
|
/* if (r0 == 0) goto pc+2 */
|
|
|
|
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
|
|
|
|
BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
|
|
|
|
/* lock *(u64 *) r0 += r1 */
|
2015-07-22 17:59:35 +00:00
|
|
|
.\" == atomic64_add
|
2015-05-24 19:31:01 +00:00
|
|
|
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
|
|
|
|
BPF_EXIT_INSN(), /* return r0 */
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog,
|
|
|
|
sizeof(prog), "GPL");
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
sock = open_raw_sock("lo");
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
|
|
|
|
sizeof(prog_fd)) == 0);
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
key = IPPROTO_TCP;
|
|
|
|
assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
|
|
|
|
key = IPPROTO_UDP
|
|
|
|
assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
|
|
|
|
printf("TCP %lld UDP %lld packets\n", tcp_cnt, udp_cnt);
|
|
|
|
sleep(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
.fi
|
2015-06-09 11:13:13 +00:00
|
|
|
|
2015-07-22 14:45:08 +00:00
|
|
|
Some complete working code can be found in the
|
2015-06-09 11:13:13 +00:00
|
|
|
.IR samples/bpf
|
|
|
|
directory in the kernel source tree.
|
2015-03-13 19:16:32 +00:00
|
|
|
.SH RETURN VALUE
|
|
|
|
For a successful call, the return value depends on the operation:
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_CREATE
|
2015-07-22 14:45:08 +00:00
|
|
|
The new file descriptor associated with the eBPF map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_PROG_LOAD
|
2015-06-04 11:11:19 +00:00
|
|
|
The new file descriptor associated with the eBPF program.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
All other commands
|
|
|
|
Zero.
|
|
|
|
.PP
|
|
|
|
On error, \-1 is returned, and
|
|
|
|
.I errno
|
|
|
|
is set appropriately.
|
|
|
|
.SH ERRORS
|
|
|
|
.TP
|
|
|
|
.B EPERM
|
2015-05-24 19:47:35 +00:00
|
|
|
The call was made without sufficient privilege
|
2015-03-13 19:16:32 +00:00
|
|
|
(without the
|
|
|
|
.B CAP_SYS_ADMIN
|
|
|
|
capability).
|
|
|
|
.TP
|
|
|
|
.B ENOMEM
|
|
|
|
Cannot allocate sufficient memory.
|
|
|
|
.TP
|
|
|
|
.B EBADF
|
|
|
|
.I fd
|
|
|
|
is not an open file descriptor
|
|
|
|
.TP
|
|
|
|
.B EFAULT
|
2015-05-24 19:31:01 +00:00
|
|
|
One of the pointers
|
|
|
|
.RI ( key
|
2015-03-13 19:16:32 +00:00
|
|
|
or
|
|
|
|
.I value
|
|
|
|
or
|
|
|
|
.I log_buf
|
|
|
|
or
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR insns )
|
|
|
|
is outside the accessible address space.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
The value specified in
|
|
|
|
.I cmd
|
|
|
|
is not recognized by this kernel.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
For
|
|
|
|
.BR BPF_MAP_CREATE ,
|
|
|
|
either
|
|
|
|
.I map_type
|
|
|
|
or attributes are invalid.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
For
|
|
|
|
.BR BPF_MAP_*_ELEM
|
|
|
|
commands,
|
2015-05-24 19:31:01 +00:00
|
|
|
some of the fields of
|
|
|
|
.I "union bpf_attr"
|
|
|
|
that are not used by this command
|
2015-03-13 19:16:32 +00:00
|
|
|
are not set to zero.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
For
|
|
|
|
.BR BPF_PROG_LOAD,
|
2015-05-24 10:00:38 +00:00
|
|
|
indicates an attempt to load an invalid program.
|
2015-07-23 10:11:21 +00:00
|
|
|
eBPF programs can be deemed
|
|
|
|
invalid due to unrecognized instructions, the use of reserved fields, jumps
|
2015-03-13 19:16:32 +00:00
|
|
|
out of range, infinite loops or calls of unknown functions.
|
|
|
|
.TP
|
|
|
|
.BR EACCES
|
|
|
|
For
|
|
|
|
.BR BPF_PROG_LOAD,
|
|
|
|
even though all program instructions are valid, the program has been
|
2015-05-24 10:00:38 +00:00
|
|
|
rejected because it was deemed unsafe.
|
|
|
|
This may be because it may have
|
2015-03-13 19:16:32 +00:00
|
|
|
accessed a disallowed memory region or an uninitialized stack/register or
|
2015-05-26 10:56:34 +00:00
|
|
|
because the function constraints don't match the actual types or because
|
2015-03-13 19:16:32 +00:00
|
|
|
there was a misaligned memory access.
|
2015-05-24 19:47:35 +00:00
|
|
|
In this case, it is recommended to call
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf ()
|
|
|
|
again with
|
2015-03-13 19:16:32 +00:00
|
|
|
.I log_level = 1
|
|
|
|
and examine
|
|
|
|
.I log_buf
|
|
|
|
for the specific reason provided by the verifier.
|
|
|
|
.TP
|
|
|
|
.BR ENOENT
|
|
|
|
For
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
|
|
|
or
|
2015-05-25 17:49:29 +00:00
|
|
|
.BR BPF_MAP_DELETE_ELEM ,
|
2015-03-13 19:16:32 +00:00
|
|
|
indicates that the element with the given
|
|
|
|
.I key
|
|
|
|
was not found.
|
|
|
|
.TP
|
|
|
|
.BR E2BIG
|
2015-07-23 10:11:21 +00:00
|
|
|
The eBPF program is too large or a map reached the
|
2015-03-13 19:16:32 +00:00
|
|
|
.I max_entries
|
2015-05-26 06:34:12 +00:00
|
|
|
limit (maximum number of elements).
|
2015-05-26 07:48:56 +00:00
|
|
|
.SH VERSIONS
|
|
|
|
The
|
|
|
|
.BR bpf ()
|
|
|
|
system call first appeared in Linux 3.18.
|
2015-05-26 07:49:45 +00:00
|
|
|
.SH CONFORMING TO
|
|
|
|
The
|
|
|
|
.BR bpf ()
|
|
|
|
system call is Linux-specific.
|
2015-03-13 19:16:32 +00:00
|
|
|
.SH NOTES
|
2015-05-26 10:56:34 +00:00
|
|
|
In the current implementation, all
|
|
|
|
.BR bpf ()
|
|
|
|
commands require the caller to have the
|
2015-03-13 19:16:32 +00:00
|
|
|
.B CAP_SYS_ADMIN
|
2015-05-26 10:56:34 +00:00
|
|
|
capability.
|
2015-07-22 12:53:08 +00:00
|
|
|
|
|
|
|
eBPF objects (maps and programs) can be shared between processes.
|
|
|
|
For example, after
|
|
|
|
.BR fork (2),
|
|
|
|
the child inherits file descriptors referring to the same eBPF objects.
|
|
|
|
In addition, file descriptors referring to eBPF objects can be
|
|
|
|
transferred over UNIX domain sockets.
|
|
|
|
File descriptors referring to eBPF objects can be duplicated
|
|
|
|
in the usual way, using
|
|
|
|
.BR dup (2)
|
|
|
|
and similar calls.
|
|
|
|
An eBPF object is deallocated only after all file descriptors
|
|
|
|
referring to the object have been closed.
|
|
|
|
|
2015-07-22 17:59:35 +00:00
|
|
|
eBPF programs can be written in a restricted C that is compiled (using the
|
|
|
|
.B clang
|
2015-07-23 10:11:21 +00:00
|
|
|
compiler) into eBPF bytecode.
|
|
|
|
Various features are omitted from this restricted C, such as loops,
|
2015-07-22 12:53:08 +00:00
|
|
|
global variables, variadic functions, floating-point numbers,
|
2015-07-23 10:11:21 +00:00
|
|
|
and passing structures as function arguments.
|
2015-07-22 17:59:35 +00:00
|
|
|
Some examples can be found in the
|
|
|
|
.I samples/bpf/*_kern.c
|
|
|
|
files in the kernel source tree.
|
2015-07-22 14:45:08 +00:00
|
|
|
.\" There are also examples for the tc classifier, in the iproute2
|
|
|
|
.\" project, in examples/bpf
|
2015-07-23 10:11:21 +00:00
|
|
|
|
|
|
|
The kernel contains a just-in-time (JIT) compiler that translates
|
|
|
|
eBPF bytecode into native machine code for better performance.
|
|
|
|
The JIT compiler is disabled by default,
|
|
|
|
but its operation can be controlled by writing one of the
|
|
|
|
following integer strings to the file
|
|
|
|
.IR /proc/sys/net/core/bpf_jit_enable :
|
|
|
|
.IP 0 3
|
|
|
|
Disable JIT compilation (default).
|
|
|
|
.IP 1
|
|
|
|
Normal compilation.
|
|
|
|
.IP 2
|
|
|
|
Debugging mode.
|
|
|
|
The generated opcodes are dumped in hexadecimal into the kernel log.
|
|
|
|
These opcodes can then be disassembled using the program
|
|
|
|
.IR tools/net/bpf_jit_disasm.c
|
|
|
|
provided in the kernel source tree.
|
2015-07-23 13:36:42 +00:00
|
|
|
.PP
|
|
|
|
JIT compiler for eBPF is currently available for the x86-64, arm64,
|
|
|
|
and s390 architectures.
|
2015-03-13 19:16:32 +00:00
|
|
|
.SH SEE ALSO
|
2015-05-26 10:56:34 +00:00
|
|
|
.BR seccomp (2),
|
2015-06-07 13:33:34 +00:00
|
|
|
.BR socket (7),
|
2015-07-22 15:58:46 +00:00
|
|
|
.BR tc (8),
|
|
|
|
.BR tc-bpf (8)
|
2015-05-26 10:56:34 +00:00
|
|
|
|
2015-05-24 19:47:35 +00:00
|
|
|
Both classic and extended BPF are explained in the kernel source file
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR Documentation/networking/filter.txt .
|