I. Introduction
Had intended to re-learn about volatile
the realization of the principle, which involves the rearrangement instruction scheduling and data visibility to ensure understanding of both inseparable from the CPU Cache is, therefore, to remind ourselves CPU Cache, there will be some paper.
Second, why the need for CPU Cache
CPU development showing Moore's Law (recently more and more voices that ended), the rapid pace of development, performance doubling every 18-24 months. In contrast the development of memory is very slow, and CPU performance gap is growing. To cushion the speed difference therebetween, is introduced using L3 cache Cache SRAM do (L1, L2, L3), in order to improve the computational efficiency of the CPU. Of course, in fact, memory is not impossible speed, just out of balance cost and capacity.
In 1965, Intel co-founder Gordon Moore made his own name of "Moore's Law" means the integrated circuit can accommodate the number of components will double every 18-24 months, the performance will also be double.
It's like we go grocery shopping, things often purchased on so few real shopping time is very short, but time consuming traffic, pay queuing time-consuming usually have occupied most of the time. For cost reasons, a cell equipped with a supermarket is clearly not possible. So the introduction of the house downstairs vending machines, convenience stores and other communities. In this way, improve the efficiency of natural shopping.
Three, L1, L2, L3 Cache buffer structure three
Three integrated in the CPU cache, the following composition, each CPU has its own core L1 Cache, L2 Cache, L3 Cache and all share the core. Wherein, the distance Ll Execution Units nearest calculating unit, the calculation speed is generally very close; L2, L3, respectively followed. Further, L1 Cache generally divided L1i L1d data cache and instruction cache sets, multi-core CPU to reduce conflicts / multithreaded cache competition caused grab.
When reading data, sequential access, i.e. to access L1 execution unit, if the data is present, the access L1 L2, L2 L3 if access is not the same, and finally L3 memory access.
Three-level cache size is generally small, native i5-8259u
as an example:
i5-8259u:
L1 Data Cache :32.0 KB x 4
L1 Instruction Cache :32.0 KB x 4
L2 Cache: 246 KB x 4
L3 Cache :6.00 MB
L4 Cache: 0.00 B
Memory :16.0 GB 2133MHz LPDDR3
Four, Cache Line: Data exchange with the memory of the smallest unit
A Cache is divided into N Cache Line, typically a size of 32byte or 64byte, and the memory is the minimum unit of data exchange. At least a Cache Line Valid, tag, three block portion, wherein the block to store data, tag for indicating a memory address, the Valid for indicating the validity of the data.
When the CPU core to access data, found that the data is in a Cache Line, and the valid status is valid, then it became cache hit, otherwise, become a cache miss. Typically, cache hits and misses affect the efficiency of the difference of a few hundred core clock. Therefore, in order cache hit ratio, rational and effective use of cached data set and the replacement policy is essential for the calculation of the efficiency of the CPU. Like convenience stores, according to community shopping habits of residents with a high frequency of consumption goods, and adjusted according to the evolution of customer preferences.
The main CPU cache data is provided in accordance with spatial locality, temporal locality.
Spatial locality: If a storage location of data is accessed, the data it near the location of the large may also be accessed.
Temporal locality: If a storage location of data is accessed, then it may be repeated in the future to access a lot of time.
The cache replacement strategy in general, there are three, FIFO FIFO, LRU Least Recently Used, and LFU most infrequently used:
FIFO: First In First Out, according to time into the cache, eliminating the earliest.
LRU: Least Recently Used, cache data usage statistics, out of the least used. The algorithm uses the most.
LFU: Least Frequently Used, over time, based on usage, out of the least used.
(LUR and LFU algorithm exercises: Leetcode-the LRU , Leetcode-LFU )
Five, MEIS: cache coherency
After the introduction of multi-level cache, combined with effective data set and replacement policy, greatly improving the computational efficiency of the CPU, but also brings a cache coherency problem.
5.1 the underlying operating
In order to ensure consistency Cache, CPU ground floor offers two modes of operation: Write invalidate and Write update.
Write invalidate operation means: a core when a modified data, if this other kernel data, put the valid marked as invalid.
Write update operation means: When a kernel modified copy of the data, if there is this other core data, are updated to the new value.
Write invalidate operation is simpler to implement, together with other cores do not need to follow the data change. A disadvantage is that a valid identifier corresponding Cache Line, this way, also other valid data is originally set to be invalid. Write update operation will generate a lot of update operations, but only need to update the modified data, rather than a Cache Line. Most processors are used in Operation Write invalide.
After these two operations is our common agricultural code caching, modify the data cache, the cache will be invalidated or updated directly. Of course, in addition, for less demanding real-time data cache, we also often use a regular time, automatically expire strategy.
5.2 MESI protocol
Write invalidate cache coherency provides a simple solution ideas, specific implementation needs a complete agreement, more classic, is often used as a textbook MESI protocol, many protocols are based on the MESI subsequent expansion.
And Cache Line hereinbefore mentioned conventional construction is different, MESI protocol, Cache Line head bit to represent two of the four MESI state:
These four states are:
M (Modified): the data is modified, belong to an active state, but data is only present in Cache, memory and inconsistent.
E (Exclusive): exclusive data belonging to an active state, the data present in the Cache, and consistent only memory.
S (Shared): non-exclusive data belonging to an active state, data are stored in a plurality of Cache, and consistent memory.
I (Invalid): The data is invalid.
The following four states with a schematic drawing easy to understand, images are taken from "lying processor":
Note that: when the state of E / S, it is consistent with the data cache. The revised data of a core Cache, and will not immediately written back to memory, but marked the Cache Line M, the other core parts of the data are expressed as I, at this time the data is inconsistent.
The following is a schematic view of the conversion of the four states:
From the state transition diagram, it is noted that: no matter what state the current Cache Line, for two modifications to this operation --Local Write Cache Line status is changed to Modified, and other Cache Line unified set Invalid (if other nuclear in S), waiting for a trigger written back to memory; and Remote write cache Line state there will be all the parts of a unified data changed to invalid fail, equivalent to rebuild the cached data.
Next, the transformation of the four states will be specifically described respectively:
When the status is Invalid:
Current state | event | The next state | Explanation |
---|---|---|---|
Invalid | Local Read | Exclusive | This data can not be found in other Cache |
Invalid | Local Write | Shared | (1) If there is data in the Cache Line M, the memory to update the data; (2) if there is the data, Cache Line E, the data is read, and the two are set Cache Line is S; (. 3) if there is the S Cache Line data, the data is read, is set to present Cache Line S |
Invalid | Remote Read | Invalid | The core does not read other data |
Invalid | Remote Write | Invalid | Other core does not write the data |
When the status is Exclusive:
Current state | event | The next state | Explanation |
---|---|---|---|
Exclusive | Local Read | Exclusive | Read their own exclusive data, the same natural state |
Exclusive | Local Write | Modified | It is set to M, because the data is inconsistent with memory, such as triggering a write-back |
Exclusive | Remote Read | Shared | The core does not read other data |
Exclusive | Remote Write | Invalid | Other core does not write the data |
When the status is Shared:
Current state | event | The next state | Explanation |
---|---|---|---|
Shared | Local Read | Shared | Read shared data, no data is changed, unchanged natural state |
Shared | Local Write | Modified | It is set to M, because the data is inconsistent with memory, such as triggering a write-back |
Shared | Remote Read | Shared | Read shared data, no data is changed, unchanged natural state |
Shared | Remote Write | Invalid | Other modifications do not own nuclear Cache, unified policy failure |
When the state is Modified:
Current state | event | The next state | Explanation |
---|---|---|---|
Modified | Local Read | Modified | Read their own unique data, the same natural state |
Modified | Local Write | Modified | Write their own unique data, the same status unchanged |
Modified | Remote Read | Shared | Read shared data, no data is changed, unchanged natural state |
Modified | Remote Write | Invalid | Other modifications do not own nuclear Cache, unified policy failure |