2015-03-13 19:16:32 +00:00
|
|
|
.\" Copyright (C) 2015 Alexei Starovoitov <ast@kernel.org>
|
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
|
|
|
.TH BPF 2 2015-03-10 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
bpf - perform a command on an extended BPF map or program
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.nf
|
|
|
|
.B #include <linux/bpf.h>
|
|
|
|
.sp
|
|
|
|
.BI "int bpf(int cmd, union bpf_attr *attr, unsigned int size);
|
|
|
|
|
|
|
|
.SH DESCRIPTION
|
2015-05-24 19:47:35 +00:00
|
|
|
The
|
2015-05-25 17:49:29 +00:00
|
|
|
.BR bpf ()
|
|
|
|
system call performs a range of operations on extended
|
2015-03-13 19:16:32 +00:00
|
|
|
Berkeley Packet Filter which can be characterized as
|
2015-05-25 17:49:29 +00:00
|
|
|
"a universal in-kernel virtual machine".
|
2015-05-24 10:00:38 +00:00
|
|
|
The extended BPF (or eBPF) is similar to
|
|
|
|
the original BPF (or classic BPF) used to filter network packets.
|
|
|
|
Both statically analyze the programs before loading them into the kernel to
|
2015-03-13 19:16:32 +00:00
|
|
|
ensure that they cannot harm the running system.
|
|
|
|
.P
|
|
|
|
eBPF extends classic BPF in multiple ways including the ability to call
|
2015-05-25 17:49:29 +00:00
|
|
|
in-kernel helper functions and access shared data structures such as BPF maps.
|
2015-03-13 19:16:32 +00:00
|
|
|
The programs can be written in a restricted C that is compiled into
|
2015-05-25 17:49:29 +00:00
|
|
|
eBPF bytecode and executed on the in-kernel virtual machine or
|
|
|
|
just-in-time compiled into native code.
|
2015-03-13 19:16:32 +00:00
|
|
|
.SS Extended BPF Design/Architecture
|
|
|
|
.P
|
|
|
|
BPF maps are a generic data structure for storage of different data types.
|
|
|
|
A user process can create multiple maps (with key/value-pairs being
|
2015-05-25 17:49:29 +00:00
|
|
|
opaque bytes of data) and access them via file descriptors.
|
2015-03-13 19:16:32 +00:00
|
|
|
BPF programs can access maps from inside the kernel in parallel.
|
|
|
|
It's up to the user process and BPF program to decide what they store
|
|
|
|
inside maps.
|
|
|
|
.P
|
2015-05-24 10:00:38 +00:00
|
|
|
BPF programs are similar to kernel modules.
|
|
|
|
They are loaded by the user
|
2015-03-13 19:16:32 +00:00
|
|
|
process and automatically unloaded when the process exits.
|
|
|
|
Each BPF program is a set of instructions that is safe to run until
|
2015-05-24 10:00:38 +00:00
|
|
|
its completion.
|
|
|
|
The BPF verifier statically determines that the program
|
|
|
|
terminates and is safe to execute.
|
2015-05-25 17:49:29 +00:00
|
|
|
During verification, the program takes hold of maps that it intends to use,
|
2015-05-24 10:00:38 +00:00
|
|
|
so selected maps cannot be removed until the program is unloaded.
|
|
|
|
The program can be attached to different events.
|
|
|
|
These events can be packets, tracing
|
|
|
|
events and other types that may be added in the future.
|
|
|
|
A new event triggers
|
2015-03-13 19:16:32 +00:00
|
|
|
execution of the program which may store information about the event in the maps.
|
|
|
|
Beyond storing data the programs may call into in-kernel helper functions.
|
|
|
|
The same program can be attached to multiple events and different programs can
|
|
|
|
access the same map:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-24 19:31:01 +00:00
|
|
|
tracing tracing tracing packet packet
|
|
|
|
event A event B event C on eth0 on eth1
|
|
|
|
| | | | |
|
|
|
|
| | | | |
|
|
|
|
--> tracing <-- tracing socket socket
|
|
|
|
prog_1 prog_2 prog_3 prog_4
|
|
|
|
| | | |
|
|
|
|
|--- -----| |-------| map_3
|
|
|
|
map_1 map_2
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
2015-05-24 19:47:35 +00:00
|
|
|
.SS Arguments
|
|
|
|
The
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf ()
|
2015-05-24 19:47:35 +00:00
|
|
|
system call operation is determined by
|
2015-03-13 19:16:32 +00:00
|
|
|
.IR cmd
|
|
|
|
which can be one of the following:
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_CREATE
|
|
|
|
Create a map with the given type and attributes and return map FD
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
2015-05-24 19:47:35 +00:00
|
|
|
Look up element by key in a given map and return its value
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_UPDATE_ELEM
|
|
|
|
Create or update element (key/value pair) in a given map
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_DELETE_ELEM
|
2015-05-24 19:47:35 +00:00
|
|
|
Look up and delete element by key in a given map
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_GET_NEXT_KEY
|
2015-05-24 19:47:35 +00:00
|
|
|
Look up element by key in a given map and return key of next element
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_PROG_LOAD
|
|
|
|
Verify and load BPF program
|
2015-05-24 19:31:01 +00:00
|
|
|
.PP
|
|
|
|
.I attr
|
|
|
|
is a pointer to a union of type
|
|
|
|
.I bpf_attr
|
|
|
|
as defined below.
|
|
|
|
|
|
|
|
.I size
|
2015-03-13 19:16:32 +00:00
|
|
|
is the size of the union.
|
|
|
|
.P
|
|
|
|
.nf
|
|
|
|
union bpf_attr {
|
|
|
|
struct { /* anonymous struct used by BPF_MAP_CREATE command */
|
2015-05-24 19:31:01 +00:00
|
|
|
__u32 map_type;
|
|
|
|
__u32 key_size; /* size of key in bytes */
|
|
|
|
__u32 value_size; /* size of value in bytes */
|
|
|
|
__u32 max_entries; /* maximum number of entries
|
|
|
|
in a map */
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
|
2015-05-24 19:31:01 +00:00
|
|
|
__u32 map_fd;
|
|
|
|
__aligned_u64 key;
|
2015-03-13 19:16:32 +00:00
|
|
|
union {
|
|
|
|
__aligned_u64 value;
|
|
|
|
__aligned_u64 next_key;
|
|
|
|
};
|
2015-05-24 19:31:01 +00:00
|
|
|
__u64 flags;
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct { /* anonymous struct used by BPF_PROG_LOAD command */
|
2015-05-24 19:31:01 +00:00
|
|
|
__u32 prog_type;
|
|
|
|
__u32 insn_cnt;
|
|
|
|
__aligned_u64 insns; /* 'const struct bpf_insn *' */
|
|
|
|
__aligned_u64 license; /* 'const char *' */
|
|
|
|
__u32 log_level; /* verbosity level of verifier */
|
|
|
|
__u32 log_size; /* size of user buffer */
|
|
|
|
__aligned_u64 log_buf; /* user supplied 'char *' buffer */
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
} __attribute__((aligned(8)));
|
|
|
|
.fi
|
|
|
|
.SS BPF maps
|
2015-05-25 17:49:29 +00:00
|
|
|
Maps are a generic data structure for storage of different types
|
|
|
|
and sharing data between kernel and user space.
|
|
|
|
|
|
|
|
Each map type has the following attributes:
|
|
|
|
|
|
|
|
.PD 0
|
|
|
|
.IP * 3
|
|
|
|
type
|
|
|
|
.IP *
|
|
|
|
max number of elements
|
|
|
|
.IP *
|
|
|
|
key size in bytes
|
|
|
|
.IP *
|
|
|
|
value size in bytes
|
|
|
|
.PD
|
|
|
|
.PP
|
2015-05-24 19:47:35 +00:00
|
|
|
The following wrapper functions demonstrate how this system
|
|
|
|
call can be used to access the maps.
|
2015-05-24 10:00:38 +00:00
|
|
|
The functions use the
|
2015-03-13 19:16:32 +00:00
|
|
|
.IR cmd
|
|
|
|
argument to invoke different operations.
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_CREATE
|
|
|
|
.nf
|
|
|
|
int bpf_create_map(enum bpf_map_type map_type, int key_size,
|
|
|
|
int value_size, int max_entries)
|
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_type = map_type,
|
|
|
|
.key_size = key_size,
|
|
|
|
.value_size = value_size,
|
|
|
|
.max_entries = max_entries
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-24 19:47:35 +00:00
|
|
|
The
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf ()
|
2015-05-24 19:47:35 +00:00
|
|
|
system call creates a map of
|
2015-03-13 19:16:32 +00:00
|
|
|
.I map_type
|
|
|
|
type and given attributes
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR key_size ,
|
|
|
|
.IR value_size ,
|
|
|
|
.IR max_entries .
|
2015-05-24 19:47:35 +00:00
|
|
|
On success, it returns a process-local file descriptor.
|
2015-05-24 10:00:38 +00:00
|
|
|
On error, \-1 is returned and
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
is set to
|
|
|
|
.BR EINVAL ,
|
|
|
|
.BR EPERM ,
|
|
|
|
or
|
|
|
|
.BR ENOMEM .
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
The attributes
|
|
|
|
.I key_size
|
|
|
|
and
|
|
|
|
.I value_size
|
|
|
|
will be used by the verifier during program loading to check that the program
|
2015-05-24 19:31:01 +00:00
|
|
|
is calling
|
|
|
|
.BR bpf_map_*_elem ()
|
|
|
|
helper functions with a correctly initialized
|
2015-03-13 19:16:32 +00:00
|
|
|
.I key
|
|
|
|
and that the program doesn't access map element
|
|
|
|
.I value
|
|
|
|
beyond the specified
|
2015-05-25 17:49:29 +00:00
|
|
|
.IR value_size .
|
2015-05-24 19:31:01 +00:00
|
|
|
For example, when a map is created with
|
|
|
|
.IR "key_size = 8"
|
|
|
|
and the program calls
|
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
bpf_map_lookup_elem(map_fd, fp - 4)
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
the program will be rejected,
|
2015-05-24 19:31:01 +00:00
|
|
|
since the in-kernel helper function
|
|
|
|
|
|
|
|
bpf_map_lookup_elem(map_fd, void *key)
|
|
|
|
|
|
|
|
expects to read 8 bytes from
|
|
|
|
.I key
|
|
|
|
pointer, but
|
|
|
|
.IR "fp\ -\ 4"
|
|
|
|
starting address will cause out-of-bounds stack access.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
Similarly, when a map is created with
|
|
|
|
.I "value_size = 1"
|
|
|
|
and the program calls
|
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
value = bpf_map_lookup_elem(...);
|
2015-05-24 19:31:01 +00:00
|
|
|
*(u32 *) value = 1;
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
the program will be rejected, since it accesses the
|
|
|
|
.I value
|
2015-05-24 19:31:01 +00:00
|
|
|
pointer beyond the specified 1 byte
|
|
|
|
.I value_size
|
|
|
|
limit.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
Currently two
|
|
|
|
.I map_type
|
|
|
|
are supported:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
enum bpf_map_type {
|
|
|
|
BPF_MAP_TYPE_UNSPEC,
|
|
|
|
BPF_MAP_TYPE_HASH,
|
|
|
|
BPF_MAP_TYPE_ARRAY,
|
|
|
|
};
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
.I map_type
|
2015-05-24 10:00:38 +00:00
|
|
|
selects one of the available map implementations in the kernel.
|
2015-05-25 17:49:29 +00:00
|
|
|
For all map types,
|
2015-05-24 19:31:01 +00:00
|
|
|
programs access maps with the same
|
|
|
|
.BR bpf_map_lookup_elem ()/
|
|
|
|
.BR bpf_map_update_elem ()
|
2015-03-13 19:16:32 +00:00
|
|
|
helper functions.
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
|
|
|
.nf
|
|
|
|
int bpf_lookup_elem(int fd, void *key, void *value)
|
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
|
|
|
.key = ptr_to_u64(key),
|
|
|
|
.value = ptr_to_u64(value),
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-24 19:47:35 +00:00
|
|
|
The
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf ()
|
2015-05-24 19:47:35 +00:00
|
|
|
system call looks up an element with a given
|
2015-03-13 19:16:32 +00:00
|
|
|
.I key
|
|
|
|
in a map
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR fd .
|
2015-05-24 19:47:35 +00:00
|
|
|
If an element is found, it returns zero and stores element's value into
|
2015-03-13 19:16:32 +00:00
|
|
|
.I value.
|
2015-05-24 19:47:35 +00:00
|
|
|
If no element is found, it returns \-1 and sets
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
to
|
|
|
|
.BR ENOENT .
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_UPDATE_ELEM
|
|
|
|
.nf
|
|
|
|
int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
|
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
|
|
|
.key = ptr_to_u64(key),
|
|
|
|
.value = ptr_to_u64(value),
|
|
|
|
.flags = flags,
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
The call creates or updates an element with a given
|
|
|
|
.I key/value
|
|
|
|
in a map
|
|
|
|
.I fd
|
|
|
|
according to
|
|
|
|
.I flags
|
|
|
|
which can have one of 3 possible values:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-24 19:31:01 +00:00
|
|
|
#define BPF_ANY 0 /* create new element or update existing */
|
|
|
|
#define BPF_NOEXIST 1 /* create new element if it didn't exist */
|
|
|
|
#define BPF_EXIST 2 /* update existing element */
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-05-24 19:47:35 +00:00
|
|
|
On success, it returns zero.
|
2015-03-13 19:16:32 +00:00
|
|
|
On error, \-1 is returned and
|
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
is set to
|
|
|
|
.BR EINVAL ,
|
|
|
|
.BR EPERM ,
|
|
|
|
.BR ENOMEM ,
|
|
|
|
or
|
|
|
|
.BR E2BIG .
|
2015-03-13 19:16:32 +00:00
|
|
|
.B E2BIG
|
|
|
|
indicates that the number of elements in the map reached
|
|
|
|
.I max_entries
|
|
|
|
limit specified at map creation time.
|
|
|
|
.B EEXIST
|
2015-05-24 19:31:01 +00:00
|
|
|
will be returned from a call to
|
|
|
|
|
|
|
|
bpf_update_elem(fd, key, value, BPF_NOEXIST)
|
|
|
|
|
|
|
|
if the element with
|
|
|
|
.I key
|
|
|
|
already exists in the map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.B ENOENT
|
2015-05-24 19:31:01 +00:00
|
|
|
will be returned from a call to
|
|
|
|
|
|
|
|
bpf_update_elem(fd, key, value, BPF_EXIST)
|
|
|
|
|
|
|
|
if the element with
|
|
|
|
.I key
|
|
|
|
doesn't exist in the map.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_DELETE_ELEM
|
|
|
|
.nf
|
|
|
|
int bpf_delete_elem(int fd, void *key)
|
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
|
|
|
.key = ptr_to_u64(key),
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
The call deletes an element in a map
|
|
|
|
.I fd
|
|
|
|
with a given
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR key .
|
2015-05-24 10:00:38 +00:00
|
|
|
Returns zero on success.
|
2015-05-24 19:47:35 +00:00
|
|
|
If the element is not found, it returns \-1 and sets
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
to
|
|
|
|
.BR ENOENT .
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B BPF_MAP_GET_NEXT_KEY
|
|
|
|
.nf
|
|
|
|
int bpf_get_next_key(int fd, void *key, void *next_key)
|
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.map_fd = fd,
|
|
|
|
.key = ptr_to_u64(key),
|
|
|
|
.next_key = ptr_to_u64(next_key),
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
The call looks up an element by
|
|
|
|
.I key
|
|
|
|
in a given map
|
|
|
|
.I fd
|
|
|
|
and sets the
|
|
|
|
.I next_key
|
|
|
|
pointer to the key of the next element.
|
|
|
|
If
|
|
|
|
.I key
|
|
|
|
is not found, it returns zero and sets the
|
|
|
|
.I next_key
|
|
|
|
pointer to the key of the first element.
|
|
|
|
If
|
|
|
|
.I key
|
|
|
|
is the last element, it returns \-1 and sets
|
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
to
|
|
|
|
.BR ENOENT .
|
2015-05-24 10:00:38 +00:00
|
|
|
Other possible
|
2015-03-13 19:16:32 +00:00
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
values are
|
|
|
|
.BR ENOMEM ,
|
|
|
|
.BR EFAULT ,
|
|
|
|
.BR EPERM ,
|
|
|
|
and
|
|
|
|
.BR EINVAL .
|
2015-03-13 19:16:32 +00:00
|
|
|
This method can be used to iterate over all elements in the map.
|
|
|
|
.TP
|
|
|
|
.B close(map_fd)
|
|
|
|
will delete the map
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR map_fd .
|
2015-03-13 19:16:32 +00:00
|
|
|
When the user space program that created maps exits all maps will
|
|
|
|
be deleted automatically.
|
|
|
|
|
|
|
|
.SS BPF programs
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.B BPF_PROG_LOAD
|
|
|
|
This
|
|
|
|
.IR cmd
|
2015-05-25 17:49:29 +00:00
|
|
|
is used to load an extended BPF program into the kernel.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
.nf
|
|
|
|
char bpf_log_buf[LOG_BUF_SIZE];
|
|
|
|
|
|
|
|
int bpf_prog_load(enum bpf_prog_type prog_type,
|
|
|
|
const struct bpf_insn *insns, int insn_cnt,
|
|
|
|
const char *license)
|
|
|
|
{
|
|
|
|
union bpf_attr attr = {
|
|
|
|
.prog_type = prog_type,
|
|
|
|
.insns = ptr_to_u64(insns),
|
|
|
|
.insn_cnt = insn_cnt,
|
|
|
|
.license = ptr_to_u64(license),
|
|
|
|
.log_buf = ptr_to_u64(bpf_log_buf),
|
|
|
|
.log_size = LOG_BUF_SIZE,
|
|
|
|
.log_level = 1,
|
|
|
|
};
|
|
|
|
|
|
|
|
return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
|
|
|
|
}
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.I prog_type
|
2015-03-13 19:16:32 +00:00
|
|
|
is one of the available program types:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
enum bpf_prog_type {
|
2015-05-24 19:31:01 +00:00
|
|
|
BPF_PROG_TYPE_UNSPEC,
|
|
|
|
BPF_PROG_TYPE_SOCKET_FILTER,
|
|
|
|
BPF_PROG_TYPE_SCHED_CLS,
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
By picking
|
2015-05-25 17:49:29 +00:00
|
|
|
.IR prog_type ,
|
2015-03-13 19:16:32 +00:00
|
|
|
the program author selects a set of helper functions callable from
|
|
|
|
the program and the corresponding format of
|
|
|
|
.I struct bpf_context
|
|
|
|
(which is the data blob passed into the program as the first argument).
|
|
|
|
For example, the programs loaded with
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
prog_type = BPF_PROG_TYPE_SOCKET_FILTER
|
|
|
|
|
|
|
|
may call the
|
|
|
|
.BR bpf_map_lookup_elem ()
|
|
|
|
helper,
|
2015-03-13 19:16:32 +00:00
|
|
|
whereas some future types may not.
|
|
|
|
The set of functions available to the programs under a given type may increase
|
|
|
|
in the future.
|
|
|
|
|
2015-05-25 17:49:29 +00:00
|
|
|
Currently, the set of functions for
|
2015-03-13 19:16:32 +00:00
|
|
|
.B BPF_PROG_TYPE_SOCKET_FILTER
|
|
|
|
is:
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-24 19:31:01 +00:00
|
|
|
bpf_map_lookup_elem(map_fd, void *key)
|
|
|
|
/* look up key in a map_fd */
|
|
|
|
bpf_map_update_elem(map_fd, void *key, void *value)
|
|
|
|
/* update key/value */
|
|
|
|
bpf_map_delete_elem(map_fd, void *key)
|
|
|
|
/* delete key in a map_fd */
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
and
|
|
|
|
.I bpf_context
|
2015-05-25 17:49:29 +00:00
|
|
|
is a pointer to a
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR "struct sk_buff" .
|
|
|
|
Programs cannot access fields of
|
|
|
|
.I sk_buff
|
|
|
|
directly.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 10:00:38 +00:00
|
|
|
More program types may be added in the future.
|
|
|
|
Like
|
2015-03-13 19:16:32 +00:00
|
|
|
.B BPF_PROG_TYPE_KPROBE
|
2015-05-24 19:31:01 +00:00
|
|
|
and
|
|
|
|
.I bpf_context
|
|
|
|
for it may be defined as a pointer to a
|
|
|
|
.IR "struct pt_regs" .
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.I insns
|
|
|
|
array of
|
|
|
|
.I "struct bpf_insn"
|
|
|
|
instructions.
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.I insn_cnt
|
2015-03-13 19:16:32 +00:00
|
|
|
number of instructions in the program.
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.I license
|
2015-03-13 19:16:32 +00:00
|
|
|
license string, which must be GPL compatible to call helper functions
|
2015-05-24 19:31:01 +00:00
|
|
|
marked
|
|
|
|
.IR gpl_only .
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.I log_buf
|
2015-03-13 19:16:32 +00:00
|
|
|
user supplied buffer that the in-kernel verifier is using to store the
|
2015-05-24 10:00:38 +00:00
|
|
|
verification log.
|
|
|
|
This log is a multi-line string that can be checked by
|
2015-03-13 19:16:32 +00:00
|
|
|
the program author in order to understand how the verifier came to
|
|
|
|
the conclusion that the BPF program is unsafe.
|
|
|
|
The format of the output can change at any time as the verifier evolves.
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.I log_size
|
2015-05-24 10:00:38 +00:00
|
|
|
size of user buffer.
|
|
|
|
If the size of the buffer is not large enough to store all
|
2015-03-13 19:16:32 +00:00
|
|
|
verifier messages, \-1 is returned and
|
|
|
|
.I errno
|
2015-05-24 19:31:01 +00:00
|
|
|
is set to
|
|
|
|
.BR ENOSPC .
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.I log_level
|
2015-05-24 10:00:38 +00:00
|
|
|
verbosity level of the verifier.
|
|
|
|
A value of zero means that the verifier will
|
2015-03-13 19:16:32 +00:00
|
|
|
not provide a log.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.B close(prog_fd)
|
|
|
|
will unload the BPF program.
|
|
|
|
.P
|
|
|
|
The maps are accessible from programs and used to exchange data between
|
|
|
|
programs and between them and user space.
|
|
|
|
Programs process various events (like kprobe, packets) and
|
2015-05-24 10:00:38 +00:00
|
|
|
store their data into maps.
|
|
|
|
User space fetches data from the maps.
|
2015-03-13 19:16:32 +00:00
|
|
|
Either the same or a different map may be used by user space as a configuration
|
|
|
|
space to alter program behavior on the fly.
|
|
|
|
.SS Events
|
2015-05-24 10:00:38 +00:00
|
|
|
Once a program is loaded, it can be attached to an event.
|
|
|
|
Various kernel
|
|
|
|
subsystems have different ways to do so.
|
|
|
|
For example:
|
2015-03-13 19:16:32 +00:00
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
2015-05-24 19:31:01 +00:00
|
|
|
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF,
|
|
|
|
&prog_fd, sizeof(prog_fd));
|
2015-03-13 19:16:32 +00:00
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
will attach the program
|
|
|
|
.I prog_fd
|
|
|
|
to socket
|
|
|
|
.I sock
|
2015-05-24 19:31:01 +00:00
|
|
|
which was received from a prior call to
|
|
|
|
.BR socket (2).
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
In the future
|
2015-05-24 19:31:01 +00:00
|
|
|
|
|
|
|
.in +4n
|
2015-03-13 19:16:32 +00:00
|
|
|
.nf
|
|
|
|
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
|
|
|
|
.fi
|
2015-05-24 19:31:01 +00:00
|
|
|
.in
|
|
|
|
|
2015-03-13 19:16:32 +00:00
|
|
|
may attach the program
|
|
|
|
.I prog_fd
|
|
|
|
to perf event
|
|
|
|
.I event_fd
|
2015-05-24 19:31:01 +00:00
|
|
|
which was received by prior call to
|
|
|
|
.BR perf_event_open (2).
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
.SH EXAMPLES
|
|
|
|
.nf
|
|
|
|
/* bpf+sockets example:
|
|
|
|
* 1. create array map of 256 elements
|
|
|
|
* 2. load program that counts number of packets received
|
|
|
|
* r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]
|
|
|
|
* map[r0]++
|
|
|
|
* 3. attach prog_fd to raw socket via setsockopt()
|
|
|
|
* 4. print number of received TCP/UDP packets every second
|
|
|
|
*/
|
2015-05-25 17:49:29 +00:00
|
|
|
int main(int argc, char **argv)
|
2015-03-13 19:16:32 +00:00
|
|
|
{
|
|
|
|
int sock, map_fd, prog_fd, key;
|
|
|
|
long long value = 0, tcp_cnt, udp_cnt;
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(key),
|
|
|
|
sizeof(value), 256);
|
2015-03-13 19:16:32 +00:00
|
|
|
if (map_fd < 0) {
|
|
|
|
printf("failed to create map '%s'\\n", strerror(errno));
|
|
|
|
/* likely not run as root */
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct bpf_insn prog[] = {
|
2015-05-24 19:31:01 +00:00
|
|
|
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
|
|
|
|
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol)),
|
|
|
|
/* r0 = ip->proto */
|
|
|
|
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),
|
|
|
|
/* *(u32 *)(fp - 4) = r0 */
|
|
|
|
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
|
|
|
|
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
|
|
|
|
BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
|
|
|
|
BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
|
|
|
|
/* r0 = map_lookup(r1, r2) */
|
|
|
|
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
|
|
|
|
/* if (r0 == 0) goto pc+2 */
|
|
|
|
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
|
|
|
|
BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0),
|
|
|
|
/* lock *(u64 *) r0 += r1 */
|
|
|
|
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
|
|
|
|
BPF_EXIT_INSN(), /* return r0 */
|
2015-03-13 19:16:32 +00:00
|
|
|
};
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog,
|
|
|
|
sizeof(prog), "GPL");
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
sock = open_raw_sock("lo");
|
|
|
|
|
2015-05-24 19:31:01 +00:00
|
|
|
assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
|
|
|
|
sizeof(prog_fd)) == 0);
|
2015-03-13 19:16:32 +00:00
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
key = IPPROTO_TCP;
|
|
|
|
assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
|
|
|
|
key = IPPROTO_UDP
|
|
|
|
assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
|
|
|
|
printf("TCP %lld UDP %lld packets\n", tcp_cnt, udp_cnt);
|
|
|
|
sleep(1);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
.fi
|
|
|
|
.SH RETURN VALUE
|
|
|
|
For a successful call, the return value depends on the operation:
|
|
|
|
.TP
|
|
|
|
.B BPF_MAP_CREATE
|
|
|
|
The new file descriptor associated with the BPF map.
|
|
|
|
.TP
|
|
|
|
.B BPF_PROG_LOAD
|
|
|
|
The new file descriptor associated with the BPF program.
|
|
|
|
.TP
|
|
|
|
All other commands
|
|
|
|
Zero.
|
|
|
|
.PP
|
|
|
|
On error, \-1 is returned, and
|
|
|
|
.I errno
|
|
|
|
is set appropriately.
|
|
|
|
.SH ERRORS
|
|
|
|
.TP
|
|
|
|
.B EPERM
|
2015-05-24 19:47:35 +00:00
|
|
|
The call was made without sufficient privilege
|
2015-03-13 19:16:32 +00:00
|
|
|
(without the
|
|
|
|
.B CAP_SYS_ADMIN
|
|
|
|
capability).
|
|
|
|
.TP
|
|
|
|
.B ENOMEM
|
|
|
|
Cannot allocate sufficient memory.
|
|
|
|
.TP
|
|
|
|
.B EBADF
|
|
|
|
.I fd
|
|
|
|
is not an open file descriptor
|
|
|
|
.TP
|
|
|
|
.B EFAULT
|
2015-05-24 19:31:01 +00:00
|
|
|
One of the pointers
|
|
|
|
.RI ( key
|
2015-03-13 19:16:32 +00:00
|
|
|
or
|
|
|
|
.I value
|
|
|
|
or
|
|
|
|
.I log_buf
|
|
|
|
or
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR insns )
|
|
|
|
is outside the accessible address space.
|
2015-03-13 19:16:32 +00:00
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
The value specified in
|
|
|
|
.I cmd
|
|
|
|
is not recognized by this kernel.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
For
|
|
|
|
.BR BPF_MAP_CREATE ,
|
|
|
|
either
|
|
|
|
.I map_type
|
|
|
|
or attributes are invalid.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
For
|
|
|
|
.BR BPF_MAP_*_ELEM
|
|
|
|
commands,
|
2015-05-24 19:31:01 +00:00
|
|
|
some of the fields of
|
|
|
|
.I "union bpf_attr"
|
|
|
|
that are not used by this command
|
2015-03-13 19:16:32 +00:00
|
|
|
are not set to zero.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
For
|
|
|
|
.BR BPF_PROG_LOAD,
|
2015-05-24 10:00:38 +00:00
|
|
|
indicates an attempt to load an invalid program.
|
|
|
|
BPF programs can be deemed
|
2015-03-13 19:16:32 +00:00
|
|
|
invalid due to unrecognized instructions, the use of reserved fields, jumps
|
|
|
|
out of range, infinite loops or calls of unknown functions.
|
|
|
|
.TP
|
|
|
|
.BR EACCES
|
|
|
|
For
|
|
|
|
.BR BPF_PROG_LOAD,
|
|
|
|
even though all program instructions are valid, the program has been
|
2015-05-24 10:00:38 +00:00
|
|
|
rejected because it was deemed unsafe.
|
|
|
|
This may be because it may have
|
2015-03-13 19:16:32 +00:00
|
|
|
accessed a disallowed memory region or an uninitialized stack/register or
|
|
|
|
because the function contraints don't match the actual types or because
|
|
|
|
there was a misaligned memory access.
|
2015-05-24 19:47:35 +00:00
|
|
|
In this case, it is recommended to call
|
2015-05-24 19:31:01 +00:00
|
|
|
.BR bpf ()
|
|
|
|
again with
|
2015-03-13 19:16:32 +00:00
|
|
|
.I log_level = 1
|
|
|
|
and examine
|
|
|
|
.I log_buf
|
|
|
|
for the specific reason provided by the verifier.
|
|
|
|
.TP
|
|
|
|
.BR ENOENT
|
|
|
|
For
|
|
|
|
.B BPF_MAP_LOOKUP_ELEM
|
|
|
|
or
|
2015-05-25 17:49:29 +00:00
|
|
|
.BR BPF_MAP_DELETE_ELEM ,
|
2015-03-13 19:16:32 +00:00
|
|
|
indicates that the element with the given
|
|
|
|
.I key
|
|
|
|
was not found.
|
|
|
|
.TP
|
|
|
|
.BR E2BIG
|
|
|
|
program is too large or
|
|
|
|
a map reached
|
|
|
|
.I max_entries
|
|
|
|
limit (max number of elements).
|
|
|
|
.SH NOTES
|
|
|
|
These commands may be used only by a privileged process (one having the
|
|
|
|
.B CAP_SYS_ADMIN
|
|
|
|
capability).
|
|
|
|
.SH SEE ALSO
|
2015-05-24 19:47:35 +00:00
|
|
|
Both classic and extended BPF are explained in the kernel source file
|
2015-05-24 19:31:01 +00:00
|
|
|
.IR Documentation/networking/filter.txt .
|