专题:ARM CPU 之 PMU部件(性能监控单元)

目录:

11 PMU性能监控单元 1

11.1 关于PMU 1
11.2 PMU功能描述 1
11.3 PMU寄存器汇总 2
11.4 PMU寄存器描述 5
11.4.1 PMCR性能监控寄存器 5
11.4.2 性能监控辅助识别寄存器 8
11.4.3 性能监视组件识别寄存器PMCID 8
11.5 事件 9
11.6 中断 11
11.7 导出PMU事件 12
11.7.1 外部硬件 12
11.7.2 调试跟踪硬件 12
CPU相关术语: 12

11 PMU性能监控单元

11.1 关于PMU

       基于PMUv2架构,A7处理器在运行时可以收集关于处理器和内存的各种统计信息。对于处理器来说这些统计信息中的事件非常有用,你可以利用它们来调试或者剖析代码。

       处理器PMU部件提供了4个计数器。每个计数器都可以对处理器的任何事件(可用的)计数。

11.2 PMU功能描述

       主要包含三部分,eventinterface,CP15 and APB interface,Counters

上图为PMU部件的组成,主要是那5个计数器。

Eventinterface

       Events from all other units from acrossthe design are provided to the PMU.

       提供给PMU部件的所有事件,这些事件全部来自其他的部件。

CP15 andAPB interface

The PMUregisters can be programmed using the CP15 system control coprocessor orexternal APB interface.

       PMU 寄存器可以通过CP15 协处理器和外部APB接口来编程。

Counters计数器

       PMU含有4个随事件增加32位计数器。

       PMU含有1个随处理器时钟周期增加的循环计数器。

11.3 PMU寄存器汇总

       访问方式:PMU计数器和PMU控制寄存器可以通过CP15协处理器和APB接口来访问。

       表11-1为Cortex-A7MPCore PMU 寄存器。

表11-1 PMU 寄存器汇总

number

Offset

CRn

Op1

CRm

Op2

Name

Type

Description

描述

0

0x000

c9

0

c13

2

PMXEVCNTR0

RW

Event Count Register, see the ARM
Architecture Reference Manual

事件计数寄存器

1

0x004

c9

0

c13

2

PMXEVCNTR1

RW

2

0x008

c9

0

c13

2

PMXEVCNTR2

RW

3

0x00C

c9

0

c13

2

PMXEVCNTR3

RW

4--30

0x010-0x78

-

-

-

-

-

-

Reserved

31

0x07C

c9

0

c13

0

PMCCNTR

RW

Cycle Count Register, see the ARM
Architecture Reference Manual

循环计数寄存器

32-255

0x080-0x3FC

-

-

-

-

-

Reserved

 

256

0x400

c9

0

c13

1

PMXEVTYPER0

RW

Event Type Selection Register, see the ARM Architecture Reference Manual

事件类型选择寄存器

257

0x404

c9

0

c13

1

PMXEVTYPER1

RW

258

0x408

c9

0

c13

1

PMXEVTYPER2

RW

259

0x40C

c9

0

c13

1

PMXEVTYPER3

RW

258-286

0x410-0x478

-

-

-

-

-

Reserved

 

287

0x47C

c9

0

c13

1

PMXEVTYPER31

RW

Performance Monitors Event Type Select Register 31, see the ARM Architecture Reference Manual

性能监控事件选择寄存器

288-767

0x480-0xBFC

-

-

-

-

-

Reserved

 

768

0xC00

c9

0

c12

1

PMCNTENSET

RW

Count Enable Set Register, see the ARM Architecture Reference Manual

计数使能设置寄存器

769-775

0xC04-0xC1C

-

-

-

-

-

Reserved

 

776

0xC20

c9

0

c12

2

PMCNTENCLR

RW

Count Enable Clear Register, see the ARM Architecture Reference Manual

计数使能清除寄存器

777-783

0xC24-0xC3C

-

-

-

-

-

Reserved

 

784

0xC40

c9

0

c14

1

PMINTENSET

RW

Interrupt Enable Set Register, see the ARM Architecture Reference Manual

中断使能设置寄存器

785-791

0xC44-0xC5C

-

-

-

-

-

-

Reserved

792

0xC60

c9

0

c14

2

PMINTENCLR

RW

Interrupt Enable Clear Register, see the ARM Architecture Reference Manual

中断使能清除寄存器

793-799

0xC64-0xC7C

-

-

-

-

-

-

Reserved

 

800

0xC80

c9

0

c12

3

PMOVSR

RW

Overflow Flag Status Register, see the ARM Architecture Reference Manual

溢出标志状态寄存器

801-807

0xC84-0xC9C

-

-

-

-

-

-

Reserved

808

0xCA0

c9

0

c12

4

PMSWINC

WO

Software Increment Register, see the ARM Architecture Reference Manual

软件自增(增加)寄存器

809-895

0xCA4-0xDFC

-

-

-

-

-

Reserved

896

0xE00

-

-

-

-

PMCFGR

RO

Performance Monitor Configuration Register, see the ARM Architecture Reference Manual

性能监控配置寄存器

897

0xE04

c9

0

c12

0

PMCR

RW

Performance Monitor Control Register on page 11-7

性能监控控制寄存器

898

0xE08

c9

0

c14

0

PMUSERENR

RW

User Enable Register, see the ARM Architecture Reference Manual

用户使能寄存器

899-903

0xE0C-0xE1C

-

-

-

-

-

Reserved

904

0xE20

c9

0

c12

6

PMCEID0

RO

Common Event Identification Register 0, see the ARM Architecture Reference Manual

一般事件标识寄存器1

905

0xE24

c9

0

c12

7

PMCEID1

RO

Common Event Identification Register 1, see the ARM Architecture Reference Manual

一般事件标识寄存器2

906-1003

0xE28-0xFAC

-

-

-

-

-

-

Reserved

1004

0xFB0

-

-

-

-

PMLAR

WO

Lock Access Register, see the ARM Architecture Reference Manual

锁访问寄存器

1005

0xFB4

-

-

-

-

PMLSR

RO

Lock Status Register, see the ARM Architecture Reference Manual

锁状态寄存器

1006

0xFB8

-

-

-

-

PMAUTHSTATUS

RO

Authentication Status Register, see the ARM Architecture Reference Manual

认证状态寄存器

1007-1010

0xFBC-0xFC8

-

-

-

-

-

-

Reserved

1011

0xFCC

-

-

-

-

PMDEVTYPE

RO

Device Type Register, see the ARM Architecture Reference Manua

设备类型寄存器

1012

0xFD0

-

-

-

-

PMPID4

RO

Performance Monitors Peripheral
Identification Registers on page 11-8

性能监控辅助识别寄存器

1013

0xFD4

-

-

-

-

PMPID5

RO

1014

0xFD8

-

-

-

-

PMPID6

RO

1015

0xFDC

-

-

-

-

PMPID7

RO

1016

0xFE0

-

-

-

-

PMPID0

RO

1017

0xFE4

-

-

-

-

PMPID1

RO

1018

0xFE8

-

-

-

-

PMPID2

RO

1019

0xFEC

-

-

-

-

PMPID3

RO

1020

0xFF0

-

-

-

-

PMCID0

RO

Performance Monitors Component
Identification Registers on page 11-9

性能监控主识别寄存器

1021

0xFF4

-

-

-

-

PMCID1

RO

1022

0xFF8

-

-

-

-

PMCID2

RO

1023

0xFFC

-

-

-

-

PMCID3

RO

 

操作PMU部件需要通过mrc和mcr指令来操作cp15协处理器。

11.4 PMU寄存器描述

本章节描述cortex-a7 MPCorePMU寄存器。表11-1已经给出了关于PMU寄存器的概述。

11.4.1 PMCR性能监控寄存器

访问方式:

要访问PMCR寄存器,需要采用以下方式读或者写CP15寄存器:

MRC p15,0, <Rt>, c9, c12, 0; 读PMCR

MCR p15,0, <Rt>, c9, c12, 0; 写PMCR

PMCR特征表现在以下4个方面:

目的:

提供监控性能的各种细节行为(包含计数器)

              配置和控制计数器

使用限制:

              一个可读可写的寄存器。

              对于安全和非安全模式是公共的。

              PL1或者更高级别可访问。

              在PMUSERENR.EN位被置1时可以在用户模式下访问。

配置:

              所有的配置都可用

属性:

              参见PMU寄存器汇总表11-1

图11-2及表11-2展示了PMCR寄存器的位分布图

图11-2 PMCR位分布图

Table 11-2 PMCR bit assignments
Bits  Name  Function 
[31:24]  IMP  Implementer code.
0x41 ARM.
This is a read-only field. 
执行代码。
0x41 arm模式。
只读域。
[23:16]  IDCODE  Identification code.
0x07 Cortex-A7 MPCore identification code.
This is a read-only field. 
标识代码。
0x07 cortex-a7 MPCore 标识。
只读域。
[15:11]  Number of event counters. In Secure state and Hyp mode, this field returns 0x4 that indicates the number of counters implemented.
In Non-secure modes other than Hyp mode, this field reads the value of HDCR.HPMN. See Hyp Debug Control Register on page 4-68.
This is a read-only field. 
事件计数器的个数。在安全模式或者Hyp模式下,这个域返回0x4表明被使用的计数器个数。
在非安全模式,该域读取HDCR.HPMN并返回。参见 Hyp Debug Control Register on page 4-68。
只读域。
[10:6]  - Reserved, UNK/SBZP.  保留
[5]  DP  Disable cycle counter, PMCCNTR, in regions of software when prohibited:
0 Count is enabled in prohibited regions. This is the reset value.
1 Count is disabled in prohibited regions. This bit is read/write. 
启用/禁用循环计数寄存器PMCCNTR。
0 计数器启用,这是复位值。
1 计数器禁用。
该位可读可写。
[4]  Export enable. This bit permits events to be exported to another debug device, such as a trace macrocell, over an event bus:
0 Export of events is disabled. This is the reset value.
1 Export of events is enabled.
This bit is read/write. 
导出使能。该位允许事件通过事件总线导出到调试设备。
0 事件导出禁用。这是复位值。
1 事件导出启用。
该位可读可写。
[3]  Clock divider:
0 When enabled, PMCCNTR counts every clock cycle. This is the reset value.
1 When enabled, PMCCNTR counts once every 64 clock cycles.
This bit is read/write. 
时钟分频器:
0 PMCCNTR每个时钟计数一次。复位值。
1 PMCCNTR每64个时钟计数一次。
该位可读可写。
[2]  Clock counter reset:
0 No action. This is the reset value.
1 Reset PMCCNTR to 0.
This bit is write-only, and always RAZ. 
时钟计数器复位:
0 无动作,这是复位值。
1 将PMCCNTR复位为0。
该位只可写。
[1]  Event counter reset:
0 No action. This is the reset value.
1 Reset all event counters, not including PMCCNTR, to 0.
In Non-secure modes other than Hyp mode, writing a 1 to this bit does not reset event counters that the HDCR.HPMN field reserves for Hyp mode use. See Hyp Debug Control Register on page 4-68.
In Secure state and Hyp mode, writing a 1 to this bit resets all event counters. This bit is write-only, and always RAZ. 
事件计数器复位:
0 无动作,复位值。
1 复位所有的事件计数器为0,不包括PMCCNTR
非安全模式下(除了Hyp模式)该位置1无法复位事件计数器。
安全模式和Hyp模式下,该位置1可复位所有的事件计数器。
该位只可写。
[0]  Enable bit. Performance monitor overflow IRQs are only signaled when the enable bit is set to 1.
0 All counters, including PMCCNTR, are disabled. This is the reset value.
1 All counters are enabled.
This bit is read/write. 
使能位。该位置1则性能监控溢出中断可发信号。
0 禁用所有计数器,包含PMCCNTR。这是复位值。
1 启用所有计数器。
该位可读可写。

11.4.2 性能监控辅助识别寄存器

性能监控辅助识别寄存器提供了适用于所有部件符合ARM PMUv2 架构的标准信息。他们是一组寄存器,参见表11-3。

Table 11-3 Summary of the Performance Monitors Peripheral Identification Registers

Register

Value

Offset

Performance Monitors Peripheral ID4

0x04

0xFD0

Performance Monitors Peripheral ID5

0x00

0xFD4

Performance Monitors Peripheral ID6

0x00

0xFD8

Performance Monitors Peripheral ID7

0x00

0xFDC

Performance Monitors Peripheral ID0

0xA7

0xFE0

Performance Monitors Peripheral ID1  

0xB9

0xFE4

Performance Monitors Peripheral ID2 

0x4B

0xFE8

Performance Monitors Peripheral ID3 

0x00

0xFEC


每个PMPID的[0,7]位可使用,[8,31]位是保留位。8个PMPID寄存器定义了一个64位的辅助识别寄存器。

ARMArchitecture Reference Manual该手册详细描述了这些寄存器。

11.4.3 性能监视组件识别寄存器PMCID

有4个只读的性能监视组件识别寄存器,从ID0到ID3。表11-4展示了这些寄存器。

Table 11-4 Summary of the Performance Monitors Component ID Registers

Register

Value

Offset

Performance Monitors Component ID0

0x0D

0xFF0

Performance Monitors Component ID1

0x90

0xFF4

Performance Monitors Component ID2

0x05

0xFF8

Performance Monitors Component ID3

0xB1

0xFFC

性能监视组件识别寄存器PMCID将性能监视器视为ARMPMUv2架构。

ARM Architecture Reference Manual该手册详细描述了这些寄存器。

11.5 事件

表11-5展示了PMU使用的事件号,还展示了每个事件在事件总线上的“位”位置。未列出的事件号留作它用。

Table 11-5 Performance monitor events
Event ID  PMUEVENT bit position  Description  描述
0x00  - Software increment. The register is incremented only on writes to the Software Increment Register. See the ARM Architecture Reference Manual.  软件自增。寄存器随着向软件自增寄存器写入增加而增加。参见ARM Architecture Reference Manual
0x01  [0]  Instruction fetch that causes a refill at (at least) the lowest level of instruction or unified cache. Includes the speculative linefills in the count.  导致指令或cache重填的指令预取
0x02  [1]  Instruction fetch that causes a TLB refill at (at least) the lowest level of TLB. Includes the speculative requests in the count.  导致TLB重填的指令预取
0x03  [2]  Data read or write operation that causes a refill at (at least) the lowest level of data or unified cache. Counts the number of allocations performed in the Data Cache because of a read or a write.  导致数据或cache重填的数据读写操作。包括因读或写操作导致的数据cache的申请操作
0x04  [3]  Data read or write operation that causes a cache access at (at least) the lowest level of data or unified cache. This includes speculative reads.  在数据或cache上导致cache访问的读写操作。
0x05  [4]  Data read or write operation that causes a TLB refill at (at least) the lowest level of TLB. This does not include micro TLB misses because of PLD, PLI, CP15 Cache operation by MVA and CP15 VA to PA operations.  在TLB上导致TLB充填的数据读写操作。
0x06  [5]  Data read architecturally executed. Counts the number of data read instructions accepted by the Load Store Unit. This includes counting the speculative and aborted LDR/LDM, and the reads because of the SWP instructions.   
0x07 [6]  Data write architecturally executed. Counts the number of data write instructions accepted by the Load Store Unit. This includes counting the speculative and aborted STR/STM, and the writes because of the SWP instructions.   
0x08 [7]  Instruction architecturally executed.   
0x09  [8]  Exception taken. Counts the number of exceptions architecturally taken.  发生异常
0x0A  [9]  Exception return architecturally executed. The following instructions are reported on this event: 
LDM {... pc}^
RFE
DP S pc
0x0B  [10]  Change to ContextID retired. Counts the number of instructions architecturally executed writing into the ContextID Register.   
0x0C  [11]  Software change of PC.  软件改变PC
0x0D  [12]  Immediate branch architecturally executed (taken or not taken). This includes the branches which are flushed due to a previous load/store which aborts late.   
0x0E  [13]  Procedure return (other than exception returns) architecturally executed.  程序返回
0x0F  [14]  Unaligned load-store.  未对齐的存储加载
0x10  [15]  Branch mispredicted/not predicted. Counts the number of mispredicted or not-predicted branches executed. This includes the branches which are flushed because of a previous load/store which aborts late.  分支预测失败
0x11  - Cycle counter.  循环计数器
0x12  [16]  Branches or other change in program flow that could have been predicted by the branch prediction resources of the processor. This includes the branches which are flushed because of a previous load/store which aborts late.  在程序流中可被处理器分支预测功能预测的分支或其他变化
0x13  [17]  Data memory access.  数据内存访问
0x14  [18]  Instruction Cache access.  指令cache访问
0x15  [19]  Data cache eviction.  数据cache回收
0x16  - Level 2 data cache access  level 2级别数据cache访问
0x17  - Level 2 data cache refill  level 2级别数据cache重填
0x18  - Level 2 data cache write-back. Data transfers made as a result of a coherency request from the Level 2 caches to outside of the Level 1 and Level 2 caches are not counted. Write-backs made as a result of CP15 cache maintenance operations are counted.  level 2级别数据cache回写
0x19  - Bus accesses. Single transfer bus accesses on either of the ACE read or write channels might increment twice in one cycle if both the read and write channels are active simultaneously. Operations that utilise the bus that do not explicitly transfer data, such as barrier or coherency operations are counted as bus accesses.  总线访问
0x1D  - Bus cycle  总线周期
0x60  - Bus access, read  读总线
0x61  - Bus access, write  写总线
0x86  [20]  IRQ exception taken.  中断异常
0x87  [21]  FIQ exception taken.  快速中断异常
0xC0  [22]  External memory request.  外部内存请求
0xC1  [23]  Non-cacheable external memory request.  非缓存的外部内存请求
0xC2  [24]  Linefill because of prefetch.  因为预取的linefill
0xC3  [25]  Prefetch linefill dropped.  预取linefill丢失
0xC4  [26]  Entering read allocate mode.  进入读申请模式
0xC5  [27]  Read allocate mode. 
0xC6  [28]  Reserved.  保留
0xC7  - ETM Ext Out[0]. 
0xC8  - ETM Ext Out[1]. 
0xC9  [29]  Data Write operation that stalls the pipeline because the store buffer is full.  因为store缓存满导致流水线延迟的写操作
0xCA  - Data snooped from other processor. This event counts memory-read operations that read data from another processor within the local Cortex-A7 cluster, rather than accessing the L3 cache or issuing an external read. It increments on each transaction, rathe than on each beat of data. 来自其它处理器的数据探测


11.6 中断

当PMU产生中断时,Cortex-A7 MPCore 处理器可以判断n个PMUIRQ信号。你可以将这个信号发送给一个中断控制器以便于优化和屏蔽。这是发送中断信号到处理器的唯一机制。

11.7 导出PMU事件

11.7.1 外部硬件

除了处理器中的计数器,表11-5还展示了基于PMUEVENT总线的事件,通过这些事件你可以和外部硬件联系。

11.7.2 调试跟踪硬件

表11-5中的某些事件可以导出到其他外部硬件(可调试或可跟踪)。参见CoreSight SoC Technical Reference Manual以获取更多信息。

CPU相关术语:

1.内存屏障(memory barriers)是一组处理器指令,用于实现对内存操作的顺序限制。

2.缓冲行(cache line)CPU高速缓存中可以分配的最小存储单位。处理器填写缓存行时会加载整个缓存行,现代CPU需要执行几百次CPU指令。

3.原子操作(atomicoperations)不可中断的一个或一系列操作。

4.缓存行填充(cache line fill)当处理器识别到从内存中读取操作数是可缓存的,处理器填写整个高速缓存行到适当的缓存(L1,L2,L3的或所有)。

5.缓存命中(cache hit)如果进行高速缓存行填充操作的内存位置仍然是下次处理器访问的地址时,处理器从缓存中读取操作数,而不是从内存读取。

6.写命中(write hit)当处理器将操作数写回到一个内存缓存的区域时,它首先会检查这个缓存的内存地址是否存在行中,如果存在一个有效的缓存行,则处理器将这个操作数写回到缓存,而不是写回到内存,这个操作被称为写命中。

7.写缺失(write misses thecache)一个有效的缓存行被写入到不存在的内存区域。

8.比较并交换(compare andswap)CAS操作需要输入两个数值,一个旧值(期望操作前的值)和一个新值,在操作期间先比较旧值有没有发生变化,如果没有发生变化,才交换成新值,发生了变化则不交换。

9.CPU流水线(CPU pipeline)CPU流水线的工作方式就像工业生产上的装配流水线,在CPU中由5-6个不同功能的电路单元组成一条指令处理流水线,然后将一条X86指令分成5-6步后再由这些电路单元分别执行,这样就能实现一个CPU时钟周期完成一条指令,因此提高CPU的运算速度。

10.内存顺序冲突(Memory orderviolation)内存顺序冲突一般是由假共享引起的,假共享是指多个CPU同时修改一个缓存行的不同部分引起其中一个CPU的操作无效,当出现这个内存顺序冲突时,CPU必须清空流水线。【1】

参考资料:

【1】 Java底层实现,CPU还有10个术语!

http://www.elecfans.com/d/653680.html

本文章版权归属个人,如需转载请注明出处(本CSDN blog),谢谢。


猜你喜欢

转载自blog.csdn.net/chichi123137/article/details/80145914