Linux perf(2)list event

Linux perf(2)list event

Author: Onceday Date: September 3, 2023

The long road has just begun...

Reference documents:

1 Overview

perf listUsed to list available performance events that can be used for performance analysis of perf recordand other perfsubcommands. Performance events include hardware events (such as CPU cycles, cache misses, etc.), software events (such as context switches, page faults, etc.) and trace point events (such as kernel function calls, traces of user space applications, etc.).

 Usage: perf list [<options>] [hw|sw|cache|tracepoint|pmu|sdt|metric|metricgroup|event_glob]

    -d, --desc            Print extra event descriptions. --no-desc to not print.
    -j, --json            JSON encode events and metrics
    -v, --long-desc       Print longer event descriptions.
        --debug           Enable debugging output
        --deprecated      Print deprecated events.
        --details         Print information on the perf event names and expressions used internally by events.
        --unit <PMU name>
                          Limit PMU or metric printing to the specified PMU.

The above is the perf list help output in the Linux kernel6.2 version. The perf tool is highly bound to the Linux kernel and hardware. Therefore, the output of perf list will be greatly different under different kernel versions, virtual machines, and hardware environments . The availability of many performance events depends on your current hardware and software environment.

The perf tool supports a series of measurable events. This tool and the underlying kernel interface can measure events from different sources. For example, some events are pure kernel counters, in which case they are called software events. For example: context switching, glitches.

Another source of events is the processor itself and its Performance Monitoring Unit (PMU). It provides a list of events to measure microarchitectural events such as cycle counts, instruction retirements, L1 cache misses, etc. These events are called PMU hardware events or simply hardware events. They vary by processor type and model.

The perf_events interface also provides a set of commonly used hardware event names. On each processor, if these events exist, they are mapped to the actual events provided by the CPU, otherwise the events cannot be used. Somewhat confusingly, these events are also called hardware events and hardware cache events.

Finally, there are tracepoint events implemented by the kernel ftrace infrastructure. These are only available in 2.6.3 3x and newer kernels.

PMU hardware events are CPU specific and logged by the CPU vendor . If linked against libpfm4, the perf tool library provides short descriptions of some events. For a list of PMU hardware events for Intel and AMD processors, see:

The events listed in the perf list are the supported performance events on the local device. The specific event types are in square brackets. There may be many of these events, and the results of different account permissions will be different .

For non-root users, usually only the PMU events for context switches are available. This is usually just events in the CPU PMU, predefined events like cycles and instructions, and some software events . Other pmu and global measurements are generally only rootavailable. Some event qualifiers, such as "any", are also rootqualifiers. This can be modified (using sysctl) by setting kernel.perf_event_paranoidto allow non-root users to use these events. -1In order to access tracepoint events, perfread access to /sys/kernel/debug/tracing is required, even perf_event_paranoidin permissive settings.

1.1 Print events of the specified PMU unit

--unit <PMU name>Options, when used, perf listare used to limit the output of events or metrics to a specific Performance Monitoring Unit (PMU). The PMU is a component of the processor that counts hardware events such as instructions executed, cache misses suffered, or branches mispredicted. They provide the basis for application analysis to trace dynamic control flow and identify hot spots.

Here's an example of how to use it:

perf list --unit cpu

This command will list all events or metrics available to the CPU PMU. The PMU name needs to be known in advance, depending on hardware and kernel support. Some common PMU names include cpu, cache, busand software.

Keep in mind that depending on your hardware and kernel configuration, not all PMUs may be available.

1.2 Event description format

--detailsThe internal expression of symbolic events (cycles, cache-misses, etc.) will be additionally printed, as follows:

  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
        cpu/event=0x64,umask=0x9/
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
        cpu/event=0x76/

Events are specified by their symbolic names and optional unit masks and modifiers. Event names, unit masks, and modifiers are not case-sensitive. In general, cache-missesthis notation can be used instead of cpu/event=0x64,umask=0x9/this format .

By default, events are measured at the user and kernel levels:

perf stat -e cycles dd if=/dev/zero of=/dev/null count=100000

To measure only at the user level, you need to pass a modifier ( u ):

perf stat -e cycles:u dd if=/dev/zero of=/dev/null count=100000

To measure user and kernel (explicitly):

perf stat -e cycles:uk dd if=/dev/zero of=/dev/null count=100000

Events can optionally have modifiers by appending a colon and one or more modifiers. Modifiers allow the user to limit when events are counted . The modifiers are as follows:

name identification describe
u user-space counting, user space
k kernel counting, kernel space
h hypervisor counting, virtual machine
I non idle counting, non-idle time
G guest counting (in KVM guests), KVM virtual machine
H host counting (not in KVM guests),KVM主机
p precise level, hardware event precision level
P use maximum detected precise level, use maximum detection precision level
S read sample value (PERF_SAMPLE_READ) read sample value
D pin the event to the PMU, bind the event to the PMU
W The group is weak and will fall back to the non-group if it is not schedulable
e Groups or events are exclusive and do not share the PMU

The p modifier can be used to specify how precise the instruction address is. The p modifier can be specified multiple times:

  • 0 - SAMPLE_IP can slide freely
  • 1 - SAMPLE_IP must have constant sliding
  • 2 - SAMPLE_IP requires O slider
  • 3 - SAMPLE_IP must have a 0 slider, or use randomization to avoid sample side effects.

For Intel systems, precision event sampling is implemented using PEBS, which supports precision level 2 and, in some special cases, precision level 3.

On AMD systems it is implemented using IBS (highest accuracy level up to 2). The precision modifier works with event types 0x76 (cpu-cycles, CPU clock not stopped) and 0xC1 (micro-ops retired).

2. Details

2.1 perf list performance event classification

By default, perf list lists all known events. You can also list certain types of events through the following categories:

Event class name describe
hw or hardware List hardware events such as cache-misses
sw or software List software events, such as context switches
cache or hwcache List hardware cache events such as L1-dcache-loads
tracepoint List all tracepoint events. You can also use subsys_glob:event_glob to filter subsystem tracepoint events, such as sched, block, etc.
pmu Print PMU events provided by the kernel
sdt List all statically defined tracepoint events (Statically Defined Tracepoint)
metric Metric list (metric events)
metricgroup List metric groups with metrics
–raw-dump Display the original format information of all events. This option can be followed by [hw|sw|cache|tracepoint|pmu|event_glob]
2.2 Measuring PMU events on specific hardware

For details about this chapter, please refer to the document: perf-list(1) - Linux manual page (man7.org)

Even though there are no symbolic forms of events in perf now, they can be encoded in a way that is specific to each processor.

For example, for X86 CPUs, to measure the actual PMU provided in the CPU hardware vendor's documentation, you can pass the hexadecimal parameter code:

perf stat -e r1a8 -a sleep 1
perf record -e r1a8 ...

Some processors, such as those from AMD, support event codes and unit masks larger than one byte. In this case, the bits corresponding to the event configuration parameters can be referenced from the results of the following command:

 cat /sys/bus/event_source/devices/cpu/format/event

For example, possible commands are as follows:

perf record -e r20000038f -a sleep 1
perf record -e cpu/r20000038f/ ...
perf record -e cpu/r0x20000038f/ ...

For PMU events on specific hardware, you need to refer to the processor documentation to determine how to use it .

Available PMUs and their raw parameters can be viewed at the following path:

ls /sys/devices/*/format

Some pmu's are not associated with the core but with the whole CPU socket. Events on these pmu usually cannot be sampled and can only be used perf stat -afor global counting. They can be bound to a logical CPU, but will measure all CPUs in the same socket .

socket 0This example measures memory bandwidth per second on the first memory controller on an Intel Xeon system :

perf stat -C 0 -a uncore_imc_0/cas_count_read/,uncore_imc_0/cas_count_write/ -I 1000 ...

Each memory controller has its own PMU. Measuring the entire system bandwidth requires specifying all imc pmu (see perf list output) and summing the values. To simplify the creation of multiple events, prefix and global matching are supported in PMU names, and the prefix uncore_ is also ignored when performing matching. Therefore, the above command can be extended to all memory controllers by using the following syntax:

perf stat -C 0 -a imc/cas_count_read/,imc/cas_count_write/ -I 1000 ...
perf stat -C 0 -a *imc*/cas_count_read/,*imc*/cas_count_write/ -I 1000 ...
2.3 Parameterized performance events

When some pmu events are listed, there are ?numbers in the displayed characters. as follows:

hv_gpci/dtbp_ptitc,phys_processor_idx=?/

This means that when provided as an event, ?the indicated content must also be available.

 perf stat -C 0 -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ...

It is also possible to specify additional event modifiers (percore):

perf stat -e cpu/event=0,umask=0x3,percore=1/

The above command summarizes the event counts of all hardware threads in a core .

2.4 Event group measurement

Perf supports time-based event multiplexing when the number of active events exceeds the number of hardware performance counters. Multiplexing can lead to measurement errors when a workload changes its execution profile.

When calculating metrics using formulas derived from event counts, it is useful to ensure that some events are always measured together as a group to minimize multipath errors. Event groups can be {}specified using .

perf stat -e '{instructions,cycles}' ...

The number of available performance counters depends on the CPU. A group cannot contain more events than available counters. For example, Intel Core CPUs typically have four general-purpose core performance counters, plus three fixed instructions, cyclesand ref-cyclescounters. Some special events have limitations on the counters they can schedule, and may not support multiple instances in a single group. When too many events are specified in a group, some of them will not be measured.

Global pinned events can limit the number of counters available to other groups. On x86 systems, the NMI watchdog fixes a counter by default. The NMI watchdog can be disabled under root user:

echo 0 > /proc/sys/kernel/nmi_watchdog

Events from multiple different pmu's cannot be mixed in a group, with the exception of software events.

perf also supports :Sgroup leader sampling using specifiers ( group leader sampling).

perf record -e '{cycles,instructions}:S' ...
perf report --group

Normally, all events are sampled in an event group, but when used :S, only the first event (leader) is sampled, and it only reads the values ​​of other events in the group. However, in the case of AUX area events (such as Intel PT or CoreSight), the AUX area event must be the leader event, so the second event is sampled, not the first event.

Guess you like

Origin blog.csdn.net/Once_day/article/details/132651852