perf_event_open.2: PERF_RECORD_SWITCH support

Linux 4.3 introduced two new record types for recording context
switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE.

The advantage over the existing tracepoint and software context
switch events is primarily that full switch in/out data can be
gathered even in the face of restrictive perf_event_paranoid
settings.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
This commit is contained in:
Vince Weaver 2016-10-18 13:22:20 -04:00 committed by Michael Kerrisk
parent 9227137a38
commit 9277a75d39
1 changed files with 84 additions and 5 deletions

View File

@ -243,8 +243,9 @@ struct perf_event_attr {
comm_exec : 1, /* flag comm events that are
due to exec */
use_clockid : 1, /* use clockid for time fields */
context_switch : 1, /* context switch data */
__reserved_1 : 38;
__reserved_1 : 37;
union {
__u32 wakeup_events; /* wakeup every n events */
@ -1112,6 +1113,21 @@ field.
This can make it easier to correlate perf sample times with
timestamps generated by other tools.
.TP
.IR "context_switch" " (since Linux 4.3)"
.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
This enables the generation of
.B PERF_RECORD_SWITCH
records when a context switch occurs.
It also enables the generation of
.B PERF_RECORD_SWITCH_CPU_WIDE
records when sampling in cpu-wide mode.
This functionality is in addition to existing tracepoint and
software events for measuring context switches.
The advantage of this method is that it will give full
information event with strict
.I perf_event_paranoid
settings.
.TP
.IR "wakeup_events" ", " "wakeup_watermark"
This union sets how many samples
.RI ( wakeup_events )
@ -1792,7 +1808,8 @@ Sample happened in guest user code.
.RE
.RS
In addition, one of the following bits can be set:
The following three statuses are generated by
different record types so they alias to the same bit:
.TP
.BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)"
.\" commit 2fe85427e3bf65d791700d065132772fc26e4d75
@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16
if a process name change was caused by an
.BR exec (2)
system call.
It is an alias for
.B PERF_RECORD_MISC_MMAP_DATA
since the two values would not be set in the same record.
.TP
.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)"
.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
When a
.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE
record is generated this bit indicates that the
context switch is away from the current process
(instead of in to the current process).
.RE
.RS
In addition, the following bits can be set:
.TP
.B PERF_RECORD_MISC_EXACT_IP
This indicates that the content of
@ -2583,6 +2609,59 @@ struct {
.I lost
the number of potentially lost samples.
.RE
.TP
.BR PERF_RECORD_SWITCH " (since Linux 4.3)"
\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
This record indicates a context switch has happened.
The
.B PERF_RECORD_MISC_SWITCH_OUT
bit in the
.I misc
field indicates whether it was a context switch into
or away from the current process.
.in +4n
.nf
struct {
struct perf_event_header header;
struct sample_id sample_id;
};
.fi
.TP
.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)"
\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
As with
.B PERF_RECORD_SWITCH
this record indicates a context switch has happened,
but it only occurs when sampling in cpu-wide mode
and provides additional information on the process
being switched to/from.
The
.B PERF_RECORD_MISC_SWITCH_OUT
bit in the
.I misc
field indicates whether it was a context switch into
or away from the current process.
.in +4n
.nf
struct {
struct perf_event_header header;
u32 next_prev_pid;
u32 next_prev_tid;
struct sample_id sample_id;
};
.fi
.RS
.TP
.I next_prev_pid
The process id of the previous (if switching in)
or next (if switching out) process on the CPU.
.TP
.I next_prev_tid
The thread id of the previous (if switching in)
or next (if switching out) thread on the CPU.
.RE
.RE
.SS Overflow handling
Events can be set to notify when a threshold is crossed,