Linux perf(2)list event
Author: Onceday Date: September 3, 2023
The long road has just begun...
Reference documents:
1 Overview
perf list
Used to list available performance events that can be used for performance analysis of perf record
and other perf
subcommands. Performance events include hardware events (such as CPU cycles, cache misses, etc.), software events (such as context switches, page faults, etc.) and trace point events (such as kernel function calls, traces of user space applications, etc.).
Usage: perf list [<options>] [hw|sw|cache|tracepoint|pmu|sdt|metric|metricgroup|event_glob]
-d, --desc Print extra event descriptions. --no-desc to not print.
-j, --json JSON encode events and metrics
-v, --long-desc Print longer event descriptions.
--debug Enable debugging output
--deprecated Print deprecated events.
--details Print information on the perf event names and expressions used internally by events.
--unit <PMU name>
Limit PMU or metric printing to the specified PMU.
The above is the perf list help output in the Linux kernel6.2 version. The perf tool is highly bound to the Linux kernel and hardware. Therefore, the output of perf list will be greatly different under different kernel versions, virtual machines, and hardware environments . The availability of many performance events depends on your current hardware and software environment.
The perf tool supports a series of measurable events. This tool and the underlying kernel interface can measure events from different sources. For example, some events are pure kernel counters, in which case they are called software events. For example: context switching, glitches.
Another source of events is the processor itself and its Performance Monitoring Unit (PMU). It provides a list of events to measure microarchitectural events such as cycle counts, instruction retirements, L1 cache misses, etc. These events are called PMU hardware events or simply hardware events. They vary by processor type and model.
The perf_events interface also provides a set of commonly used hardware event names. On each processor, if these events exist, they are mapped to the actual events provided by the CPU, otherwise the events cannot be used. Somewhat confusingly, these events are also called hardware events and hardware cache events.
Finally, there are tracepoint events implemented by the kernel ftrace infrastructure. These are only available in 2.6.3 3x and newer kernels.
PMU hardware events are CPU specific and logged by the CPU vendor . If linked against libpfm4, the perf tool library provides short descriptions of some events. For a list of PMU hardware events for Intel and AMD processors, see:
- Intel® 64 and IA-32 Architectures Developer’s Manual: Vol. 3B
- BIOS and Kernel Developer’s Guide (BKDG) For AMD Family 10h Processors
The events listed in the perf list are the supported performance events on the local device. The specific event types are in square brackets. There may be many of these events, and the results of different account permissions will be different .
For non-root users, usually only the PMU events for context switches are available. This is usually just events in the CPU PMU, predefined events like cycles and instructions, and some software events . Other pmu and global measurements are generally only root
available. Some event qualifiers, such as "any", are also root
qualifiers. This can be modified (using sysctl) by setting kernel.perf_event_paranoid
to allow non-root users to use these events. -1
In order to access tracepoint events, perf
read access to /sys/kernel/debug/tracing is required, even perf_event_paranoid
in permissive settings.
1.1 Print events of the specified PMU unit
--unit <PMU name>
Options, when used, perf list
are used to limit the output of events or metrics to a specific Performance Monitoring Unit (PMU). The PMU is a component of the processor that counts hardware events such as instructions executed, cache misses suffered, or branches mispredicted. They provide the basis for application analysis to trace dynamic control flow and identify hot spots.
Here's an example of how to use it:
perf list --unit cpu
This command will list all events or metrics available to the CPU PMU. The PMU name needs to be known in advance, depending on hardware and kernel support. Some common PMU names include cpu
, cache
, bus
and software
.
Keep in mind that depending on your hardware and kernel configuration, not all PMUs may be available.
1.2 Event description format
--details
The internal expression of symbolic events (cycles, cache-misses, etc.) will be additionally printed, as follows:
cache-misses OR cpu/cache-misses/ [Kernel PMU event]
cpu/event=0x64,umask=0x9/
cpu-cycles OR cpu/cpu-cycles/ [Kernel PMU event]
cpu/event=0x76/
Events are specified by their symbolic names and optional unit masks and modifiers. Event names, unit masks, and modifiers are not case-sensitive. In general, cache-misses
this notation can be used instead of cpu/event=0x64,umask=0x9/
this format .
By default, events are measured at the user and kernel levels:
perf stat -e cycles dd if=/dev/zero of=/dev/null count=100000
To measure only at the user level, you need to pass a modifier ( u ):
perf stat -e cycles:u dd if=/dev/zero of=/dev/null count=100000
To measure user and kernel (explicitly):
perf stat -e cycles:uk dd if=/dev/zero of=/dev/null count=100000
Events can optionally have modifiers by appending a colon and one or more modifiers. Modifiers allow the user to limit when events are counted . The modifiers are as follows:
name identification | describe |
---|---|
u | user-space counting, user space |
k | kernel counting, kernel space |
h | hypervisor counting, virtual machine |
I | non idle counting, non-idle time |
G | guest counting (in KVM guests), KVM virtual machine |
H | host counting (not in KVM guests),KVM主机 |
p | precise level, hardware event precision level |
P | use maximum detected precise level, use maximum detection precision level |
S | read sample value (PERF_SAMPLE_READ) read sample value |
D | pin the event to the PMU, bind the event to the PMU |
W | The group is weak and will fall back to the non-group if it is not schedulable |
e | Groups or events are exclusive and do not share the PMU |
The p modifier can be used to specify how precise the instruction address is. The p modifier can be specified multiple times:
- 0 - SAMPLE_IP can slide freely
- 1 - SAMPLE_IP must have constant sliding
- 2 - SAMPLE_IP requires O slider
- 3 - SAMPLE_IP must have a 0 slider, or use randomization to avoid sample side effects.
For Intel systems, precision event sampling is implemented using PEBS, which supports precision level 2 and, in some special cases, precision level 3.
On AMD systems it is implemented using IBS (highest accuracy level up to 2). The precision modifier works with event types 0x76 (cpu-cycles, CPU clock not stopped) and 0xC1 (micro-ops retired).
2. Details
2.1 perf list performance event classification
By default, perf list lists all known events. You can also list certain types of events through the following categories:
Event class name | describe |
---|---|
hw or hardware | List hardware events such as cache-misses |
sw or software | List software events, such as context switches |
cache or hwcache | List hardware cache events such as L1-dcache-loads |
tracepoint | List all tracepoint events. You can also use subsys_glob:event_glob to filter subsystem tracepoint events, such as sched, block, etc. |
pmu | Print PMU events provided by the kernel |
sdt | List all statically defined tracepoint events (Statically Defined Tracepoint) |
metric | Metric list (metric events) |
metricgroup | List metric groups with metrics |
–raw-dump | Display the original format information of all events. This option can be followed by [hw|sw|cache|tracepoint|pmu|event_glob] |
2.2 Measuring PMU events on specific hardware
For details about this chapter, please refer to the document: perf-list(1) - Linux manual page (man7.org)
Even though there are no symbolic forms of events in perf now, they can be encoded in a way that is specific to each processor.
For example, for X86 CPUs, to measure the actual PMU provided in the CPU hardware vendor's documentation, you can pass the hexadecimal parameter code:
perf stat -e r1a8 -a sleep 1
perf record -e r1a8 ...
Some processors, such as those from AMD, support event codes and unit masks larger than one byte. In this case, the bits corresponding to the event configuration parameters can be referenced from the results of the following command:
cat /sys/bus/event_source/devices/cpu/format/event
For example, possible commands are as follows:
perf record -e r20000038f -a sleep 1
perf record -e cpu/r20000038f/ ...
perf record -e cpu/r0x20000038f/ ...
For PMU events on specific hardware, you need to refer to the processor documentation to determine how to use it .
Available PMUs and their raw parameters can be viewed at the following path:
ls /sys/devices/*/format
Some pmu's are not associated with the core but with the whole CPU socket
. Events on these pmu usually cannot be sampled and can only be used perf stat -a
for global counting. They can be bound to a logical CPU, but will measure all CPUs in the same socket .
socket 0
This example measures memory bandwidth per second on the first memory controller on an Intel Xeon system :
perf stat -C 0 -a uncore_imc_0/cas_count_read/,uncore_imc_0/cas_count_write/ -I 1000 ...
Each memory controller has its own PMU. Measuring the entire system bandwidth requires specifying all imc pmu (see perf list output) and summing the values. To simplify the creation of multiple events, prefix and global matching are supported in PMU names, and the prefix uncore_ is also ignored when performing matching. Therefore, the above command can be extended to all memory controllers by using the following syntax:
perf stat -C 0 -a imc/cas_count_read/,imc/cas_count_write/ -I 1000 ...
perf stat -C 0 -a *imc*/cas_count_read/,*imc*/cas_count_write/ -I 1000 ...
2.3 Parameterized performance events
When some pmu events are listed, there are ?
numbers in the displayed characters. as follows:
hv_gpci/dtbp_ptitc,phys_processor_idx=?/
This means that when provided as an event, ?
the indicated content must also be available.
perf stat -C 0 -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ...
It is also possible to specify additional event modifiers (percore):
perf stat -e cpu/event=0,umask=0x3,percore=1/
The above command summarizes the event counts of all hardware threads in a core .
2.4 Event group measurement
Perf supports time-based event multiplexing when the number of active events exceeds the number of hardware performance counters. Multiplexing can lead to measurement errors when a workload changes its execution profile.
When calculating metrics using formulas derived from event counts, it is useful to ensure that some events are always measured together as a group to minimize multipath errors. Event groups can be {}
specified using .
perf stat -e '{instructions,cycles}' ...
The number of available performance counters depends on the CPU. A group cannot contain more events than available counters. For example, Intel Core CPUs typically have four general-purpose core performance counters, plus three fixed instructions
, cycles
and ref-cycles
counters. Some special events have limitations on the counters they can schedule, and may not support multiple instances in a single group. When too many events are specified in a group, some of them will not be measured.
Global pinned events can limit the number of counters available to other groups. On x86 systems, the NMI watchdog fixes a counter by default. The NMI watchdog can be disabled under root user:
echo 0 > /proc/sys/kernel/nmi_watchdog
Events from multiple different pmu's cannot be mixed in a group, with the exception of software events.
perf also supports :S
group leader sampling using specifiers ( group leader sampling
).
perf record -e '{cycles,instructions}:S' ...
perf report --group
Normally, all events are sampled in an event group, but when used :S
, only the first event (leader) is sampled, and it only reads the values of other events in the group. However, in the case of AUX area events (such as Intel PT or CoreSight), the AUX area event must be the leader event, so the second event is sampled, not the first event.