[Back] to understand the basis of CPU Cache and MESI cache coherency

I. Introduction

Had intended to re-learn about volatilethe realization of the principle, which involves the rearrangement instruction scheduling and data visibility to ensure understanding of both inseparable from the CPU Cache is, therefore, to remind ourselves CPU Cache, there will be some paper.

Second, why the need for CPU Cache

CPU development showing Moore's Law (recently more and more voices that ended), the rapid pace of development, performance doubling every 18-24 months. In contrast the development of memory is very slow, and CPU performance gap is growing. To cushion the speed difference therebetween, is introduced using L3 cache Cache SRAM do (L1, L2, L3), in order to improve the computational efficiency of the CPU. Of course, in fact, memory is not impossible speed, just out of balance cost and capacity.

In 1965, Intel co-founder Gordon Moore made his own name of "Moore's Law" means the integrated circuit can accommodate the number of components will double every 18-24 months, the performance will also be double.

Here Insert Picture Description

It's like we go grocery shopping, things often purchased on so few real shopping time is very short, but time consuming traffic, pay queuing time-consuming usually have occupied most of the time. For cost reasons, a cell equipped with a supermarket is clearly not possible. So the introduction of the house downstairs vending machines, convenience stores and other communities. In this way, improve the efficiency of natural shopping.

Three, L1, L2, L3 Cache buffer structure three

Three integrated in the CPU cache, the following composition, each CPU has its own core L1 Cache, L2 Cache, L3 Cache and all share the core. Wherein, the distance Ll Execution Units nearest calculating unit, the calculation speed is generally very close; L2, L3, respectively followed. Further, L1 Cache generally divided L1i L1d data cache and instruction cache sets, multi-core CPU to reduce conflicts / multithreaded cache competition caused grab.

When reading data, sequential access, i.e. to access L1 execution unit, if the data is present, the access L1 L2, L2 L3 if access is not the same, and finally L3 memory access.

Here Insert Picture Description

Three-level cache size is generally small, native i5-8259uas an example:

i5-8259u:

L1 Data Cache :32.0 KB x 4

L1 Instruction Cache :32.0 KB x 4

L2 Cache: 246 KB x 4

L3 Cache :6.00 MB

L4 Cache: 0.00 B

Memory :16.0 GB 2133MHz LPDDR3

Four, Cache Line: Data exchange with the memory of the smallest unit

A Cache is divided into N Cache Line, typically a size of 32byte or 64byte, and the memory is the minimum unit of data exchange. At least a Cache Line Valid, tag, three block portion, wherein the block to store data, tag for indicating a memory address, the Valid for indicating the validity of the data.

Here Insert Picture Description

When the CPU core to access data, found that the data is in a Cache Line, and the valid status is valid, then it became cache hit, otherwise, become a cache miss. Typically, cache hits and misses affect the efficiency of the difference of a few hundred core clock. Therefore, in order cache hit ratio, rational and effective use of cached data set and the replacement policy is essential for the calculation of the efficiency of the CPU. Like convenience stores, according to community shopping habits of residents with a high frequency of consumption goods, and adjusted according to the evolution of customer preferences.

The main CPU cache data is provided in accordance with spatial locality, temporal locality.

Spatial locality: If a storage location of data is accessed, the data it near the location of the large may also be accessed.

Temporal locality: If a storage location of data is accessed, then it may be repeated in the future to access a lot of time.

The cache replacement strategy in general, there are three, FIFO FIFO, LRU Least Recently Used, and LFU most infrequently used:

FIFO: First In First Out, according to time into the cache, eliminating the earliest.

LRU: Least Recently Used, cache data usage statistics, out of the least used. The algorithm uses the most.

LFU: Least Frequently Used, over time, based on usage, out of the least used.

(LUR and LFU algorithm exercises: Leetcode-the LRU , Leetcode-LFU )

Five, MEIS: cache coherency

After the introduction of multi-level cache, combined with effective data set and replacement policy, greatly improving the computational efficiency of the CPU, but also brings a cache coherency problem.

5.1 the underlying operating

In order to ensure consistency Cache, CPU ground floor offers two modes of operation: Write invalidate and Write update.

Write invalidate operation means: a core when a modified data, if this other kernel data, put the valid marked as invalid.

Write update operation means: When a kernel modified copy of the data, if there is this other core data, are updated to the new value.

Write invalidate operation is simpler to implement, together with other cores do not need to follow the data change. A disadvantage is that a valid identifier corresponding Cache Line, this way, also other valid data is originally set to be invalid. Write update operation will generate a lot of update operations, but only need to update the modified data, rather than a Cache Line. Most processors are used in Operation Write invalide.

After these two operations is our common agricultural code caching, modify the data cache, the cache will be invalidated or updated directly. Of course, in addition, for less demanding real-time data cache, we also often use a regular time, automatically expire strategy.

5.2 MESI protocol

Write invalidate cache coherency provides a simple solution ideas, specific implementation needs a complete agreement, more classic, is often used as a textbook MESI protocol, many protocols are based on the MESI subsequent expansion.

And Cache Line hereinbefore mentioned conventional construction is different, MESI protocol, Cache Line head bit to represent two of the four MESI state:

Here Insert Picture Description

These four states are:

M (Modified): the data is modified, belong to an active state, but data is only present in Cache, memory and inconsistent.

E (Exclusive): exclusive data belonging to an active state, the data present in the Cache, and consistent only memory.

S (Shared): non-exclusive data belonging to an active state, data are stored in a plurality of Cache, and consistent memory.

I (Invalid): The data is invalid.

The following four states with a schematic drawing easy to understand, images are taken from "lying processor":

Here Insert Picture Description

Note that: when the state of E / S, it is consistent with the data cache. The revised data of a core Cache, and will not immediately written back to memory, but marked the Cache Line M, the other core parts of the data are expressed as I, at this time the data is inconsistent.

The following is a schematic view of the conversion of the four states:

Here Insert Picture Description

From the state transition diagram, it is noted that: no matter what state the current Cache Line, for two modifications to this operation --Local Write Cache Line status is changed to Modified, and other Cache Line unified set Invalid (if other nuclear in S), waiting for a trigger written back to memory; and Remote write cache Line state there will be all the parts of a unified data changed to invalid fail, equivalent to rebuild the cached data.

Next, the transformation of the four states will be specifically described respectively:

When the status is Invalid:

Current state event The next state Explanation
Invalid Local Read Exclusive This data can not be found in other Cache
Invalid Local Write Shared (1) If there is data in the Cache Line M, the memory to update the data; (2) if there is the data, Cache Line E, the data is read, and the two are set Cache Line is S; (. 3) if there is the S Cache Line data, the data is read, is set to present Cache Line S
Invalid Remote Read Invalid The core does not read other data
Invalid Remote Write Invalid Other core does not write the data

When the status is Exclusive:

Current state event The next state Explanation
Exclusive Local Read Exclusive Read their own exclusive data, the same natural state
Exclusive Local Write Modified It is set to M, because the data is inconsistent with memory, such as triggering a write-back
Exclusive Remote Read Shared The core does not read other data
Exclusive Remote Write Invalid Other core does not write the data

When the status is Shared:

Current state event The next state Explanation
Shared Local Read Shared Read shared data, no data is changed, unchanged natural state
Shared Local Write Modified It is set to M, because the data is inconsistent with memory, such as triggering a write-back
Shared Remote Read Shared Read shared data, no data is changed, unchanged natural state
Shared Remote Write Invalid Other modifications do not own nuclear Cache, unified policy failure

When the state is Modified:

Current state event The next state Explanation
Modified Local Read Modified Read their own unique data, the same natural state
Modified Local Write Modified Write their own unique data, the same status unchanged
Modified Remote Read Shared Read shared data, no data is changed, unchanged natural state
Modified Remote Write Invalid Other modifications do not own nuclear Cache, unified policy failure

reference

  1. "Westward processor" MESI Cache coherency protocol - The wooden Come Chhnang

Guess you like

Origin juejin.im/post/5d171fccf265da1bca51ef82