Sike Java multithreading (six) --- Volatile principle

introduction:

When it comes to the hardware section can be abstracted by the composition of the computer bus, IO devices, main memory, a processor (CPU) and other components. Wherein data stored in the main memory, the CPU is responsible for execution of the instruction, CPU instruction execution is very fast, perform most simple instruction requires only one clock cycle, and a main memory to read the required data clock tens to hundreds cycle, the CPU read and write data from main memory will be a great delay. This time gave rise to the concept of high-speed cache.

That is, when the program is in operation, the data will require calculation of a main memory copied from the cache to the CPU which, it can be directly write its cache and read data from the CPU to calculate wherein the data, when the end of the operation, then the data in the cache is written back to main memory which, to reduce the latency of the CPU acquires data from the main memory in this way. Substantially following schematic:
Here Insert Picture Description
view of a model, can be considered as a simple single-core model, i ++ to this operation example, when the program is executed, gets the value of i starting with the main memory, which in this model, into the cache and then loaded from the CPU cache and a +1, written back to the cache after the operation is completed, and finally from the cache written back to main memory. Single-core model this operation without any problems, but since the computer from produce, have been pursuing two goals, one is how to do more, and the other is how to calculate faster, so bring change is to become a single-core multi-core cache tiered storage. A generally schematic as follows:
Here Insert Picture Description
In Figure II a schematic view of the inside, i ++ this operation there is a problem, because the multi-core CPU can be threaded parallel computing, in Core 0 and Core 1 can simultaneously i is copied to the respective buffer, and CPU each calculation, the initial assumption that i is 1, then the hope is that we expect 2, but the actual CPU because the two have their final calculation after the main memory 2 i might be, it could be other values.

After this is hardware memory architecture, there is a problem, cache coherency problem, that core 1 changes the value of the variable i, core 0 is not known, stored or old, final data for such a dirty operation .

To this end, CPU manufacturers to customize the relevant rules to solve such a hardware problem,

In the previous solution

Bus lock : lock the bus is actually very good understanding, we look at Figure II, the aforementioned variables are copied from the main memory to the cache, after the calculation is complete, it will return to write to main memory, and the cache and main memory interaction It will pass the bus. Since the variables at the same time more than one CPU can not be operated simultaneously, it will bring dirty data, as long as the blocking other CPU on the bus to ensure that only one CPU at a time of variable operation, subsequent read and write operations are not CPU dirty data. Bus key disadvantage is also obvious, somewhat similar to the operation of the multi-core monocytes into operation, the efficiency is low.

Such a solution, operating in today's multi-core becoming less and less suitable, efficiency is too low! ! ! ! In order to optimize efficiency, CPU and system vendors given the following rules to solve this problem

Cache Lock: That cache coherency protocol, mainly MSI, MESI, MOSI, the main core idea of these agreements: When the CPU write data, if found to be variable operation of the shared variable, the variable that is also present in the other CPU copy, will signal the other CPU cache line set the variable to an invalid state, so when other CPU needs to read this variable, found himself cache cache cache line of the variable is invalid, then it from memory re-read.

Well, the theory complement over, beginning in vain. . . . . . .

In JAVA development, when a variable is required for all of the threads visible, we can achieve our needs using the keyword volatile.

We know Volatile ensure visibility and orderliness, but does not guarantee atomicity, synchronized to ensure atomicity need the help of such a lock mechanism.

So Volatile is how to ensure visibility and orderliness of it?

Suppose a initFlag a Boolean variable, the default is false, public boolean initFlag = false;
when we are in a thread to change it to true, if you do not use Volatile keyword modification, other threads do not know that it has become a true modification of.

Note: Please refer to the Java Memory Model

Here Insert Picture Description
After using modified Volatile Keywords:public volatile boolean initFlag = false;
Here Insert Picture Description
Here Insert Picture Description

By way of introduction, and the two figures above, we should also know how to achieve what Volatile visibility and orderliness of it (focus).

So now we come to sum up

Volatile底层实现主要是通过汇编指令,他会锁定这块内存区域的缓存并回写到主内存,此操作被称为“缓存锁定”,MESI缓存一致性协议(由各大CPU厂商实现的硬件级别协议),会阻止同时修改被两个以上处理器缓存的内存区域数据(锁定store阶段到write阶段【这样就保证了有序性】《------》内存屏障,在以前的总线加锁是锁住所有阶段,锁的粒度被大大减小,所以也被称为轻量级锁,),当一个处理器的缓存值通过总线回写到内存时,其他CPU会通过CPU嗅探机制,将相应的缓存内的这个数据强制失效,只能从主缓存再次读取。这样就保证了可见性。

补充:内存屏障:
内存屏障提供了3个功能:

  1. 确保指令重排序时不会把其后面的指令排到内存屏障之前的位置,也不会把前面的指令排到内存屏障的后面;
  2. 强制将对缓存的修改操作立即写入主存;
  3. 如果是写操作,它会导致其他CPU中对应的缓存行无效。

这3个功能又是怎么做到的呢?来看下内存屏障的策略:

  1. 在每个volatile写操作前面插入storestore屏障;
  2. 在每个volatile写操作后面插入storeload屏障;
  3. 在每个volatile读操作后面插入loadload屏障;
  4. 在每个volatile读操作后面插入loadstore屏障;

最后,我们将代码转换为汇编指令来看看Java虚拟机到底是怎么实现的。
注:转换为汇编指令,可以通过-XX:+PrintAssembly来实现,window环境具体如何操作请参考此处(https://dropzone.nfshost.com/hsdis.xht)。

为什么Volatile不能保证原子性呢?

We know that low-level volatile through the above only when a modification will be volatile variable changes when the other threads in the data cache failure. So when you make i ++, and multiple threads may still be time to get the same value +1 operation, but there is a thread over the lead in the implementation, and the variable pushed to the main memory, so other threads of the cache this time They have failed, but really have + 1'd.
(Vernacular point is, I did, but ineffective)
(official point is, from Load to store into memory barrier, a total of four steps, the last step jvm let the value of the latest variable is visible in all the threads, which is the last step all CPU cores are given a new value, but the middle of the steps (from Load to Store) is unsafe, among other CPU if the modified value will be lost.)

Published 45 original articles · won praise 3 · Views 2325

Guess you like

Origin blog.csdn.net/weixin_44046437/article/details/99243420