Root of all evil [those things] concurrent visibility problems

Hardware engineers balanced the difference in speed between the CPU and cache, specifically added the CPU cache, even in multi-core scene accidentally became visible source concurrency issues of all evil! ( This article is too long, if not particularly boring, see here on it )

Foreword

Remember those years, those you write multi-threaded BUG it? Obviously just want to get a 1 + 1 = 2 is expected, and he sometimes get 1, get 3 times, yet sometimes he would return the correct 2. Obviously good run in local, on-line a bunch of strange BUG. Over and over again you examine the code, line by line debug, obviously without success.

Why the sudden variation in the variables? Why run the code out of order? Why condition useless? Welcome to today's "into the science" of the middle of the night. . . Oh, no, Welcome to today's "concurrent those things," the visibility of the issue of the source of all evil. As seen above, when we write concurrent programs, they would often beyond our understanding and intuition of the problem, and according to our past experience, it is difficult to perceive his problem. But because we do not understand the cause of his it happens, even if we solved in accordance with the scheme on the book, but the next time or will occur. So the gist of this article is not to solve the problem of technique, but to solve the problem of road. Together to explore the root causes of the problem of multi-threading.

First find out, it occurs most concurrency problems are caused by these three issues, the visibility of the issue, atomic issues, ordering problem . So what is the cause of these three problems? This article will step by step to resolve the cause visibility problems.

The core contradiction

As we all know, many of the computer components. Most of which the most important are three, they are the CPU, memory, IO (hard disk). In general they are three performance will directly affect the overall performance of the pros and cons of the computer.

But from their birth, there is a contradiction at the heart, even after a few years later now, the rapid development of science and technology is still not resolved. So what conflict?

Before contradiction to say, first say that I am a colleague, he is a gaming expert, awareness League, what is the king of glory special calendar harm. Each time that watching game pointing country, Hui Fang Qiu heroic sparkling. But then, started playing a game, a meal operate fierce as a tiger, a look at the record 0 5 bars, first we thought he was a bronze, but then, a lot of time really as he said that the game, his anticipation, his operations are actually quite coquettish. I have been very puzzled, until we come to a conclusion, in fact, he was indeed a king, because He is full of Sao operation, but it? He can not keep his hands flair brain.

The problem here, the core contradiction is the difference in speed . CPU like a colleague's brain, very very coquettish, but nonetheless IO is like a pair of hands can not keep up the rhythm, limiting the play. And the speed difference between them is far beyond our imagination, CPU is like a rocket, then the memory is tricycle, IO is probably the roadside a nondescript little snail.

Efforts of the parties

Now that we have this problem, then find a way to solve the first problem lies in the hardware layer, so the brunt of hardware work like a teacher trying to solve a lot of ways. Through the tireless efforts of IO memory with hardware engineers, these two velocity components have been increased dramatically. but? CPU engineers are pitching in, even CEO-- Intel's Gordon Moore also announced a name in their own definition of Moore's Law . Along the following lines:

The number of transistors on an integrated circuit that can be accommodated, about every 18 months will be doubled


Can be simply understood, CPU performance every 18 months will be able to double. This allows IO memory with hardware engineers is very desperate, afraid of other people smarter than you, smarter than you are afraid of people still harder than you. This is how to play?

image.png


Of course, the tree does not make a forest, CPU engineers are aware of the problem, no matter how I play the leading role, with 1V5. Do not use it? Playing a positive Hey, a look back, pushed home. I got a movie, double click to open, CPU running fast, IO load slowly. I smoke CPU to run it useless, IO restricted. The result is a movie turned into a PPT, a one second stop. This continues we did not have to play. Seeing other teammates with fixed, CPU engineers came up with a way for me to draw the CPU inside out as a cache, the cache between the CPU and memory, with our usual cache function similar to the balanced CPU and memory the speed difference between, will start with IO data loaded into memory during execution, and then load the data memory into the CPU cache. After the use of the common or data cache in the CPU, each CPU processing do not always like the memory, which greatly improves the utilization of the CPU.

Here, hardware engineers successfully completed the task, following the turn of our software engineers debut.

Although the addition of the cache after, CPU utilization increased exponentially, from the original run for 5 minutes to load two hours. It became, run for 2 minutes, load one hour, but the experience is still very poor. Also take a movie, for example, when a movie screen is not only there, but also a sound Yeah, you run fast, but first put video, add sound. It looks like a silent film, broadcast listen again, how much of this perception did not separate Pictures stronger than PPT.

Later, after the efforts of talented hardware and software engineers invented an amazing thing - thread. Before we say the thread first talk about the process, but after this thing that we can see things, you start than the beginning of the browser, such as micro-channel you are using, the software is started, the operating system is a process. The thread of it? It can be simply understood as a subset of a process, that process is actually a bunch of threads. And the operating system will usually all the hardware resources, including internal memory allocated to the whole of the process, the process is like a labor contractor as reassigned to the bottom of the thread. But the only have the same resources, the operating system is directly assigned to a thread, and that is CPU resources.

Such an arrangement is actually something of great significance. Some may think, it can also be distributed to the process, but the process than the much heavier threads, switching overhead is too large, Debu try. Like you want to open a new page is opening a new browser fast is it? Or open a new Tab page fast is it? Once you have short thread, we have a very cool operation - a thread switch. What he can bring it? The movie went on to say something, but it is still our first broadcast video sound came alive again. However, the above difference is, we will first put a video, add one will sound. As long as a single player's short enough, switching between the two operating fast enough, it will make people feel actually video and voice simultaneously broadcast illusion. And lightweight threads and switching capability to provide such operation possible.

So far, the efforts of numerous problems in hardware and software engineers, has been relatively perfect solution.

New problems

Things to here, that this happy, perfect virtue. Intel's results came out to make trouble, but this time he is in fact forced.

Remember to Intel CEO-- named after Gordon Moore of Moore's Law say we are above it? This law is not really rigorous scientific studies have come out, but derived through Intel's past performance in accordance with this conclusion. It stands to reason it is highly inconsistent with the laws of science, every programmer back as I have encountered a computer bag, but I can not just be seen in the street carrying a computer bag and they say he is a programmer. But Intel is so NB, he was on the streets full of programmers. So Intel maintains this law every 18 months to double the CPU performance, each lasted for many years.

image.png


Until the fourth, when he was CEO of Moore's Law suddenly not working, when he was on the map is CEO-- Intel Craig Barrett. In a technical conference, the participants kneel. As has repeatedly been postponed until the final defeat to give up clocked at 4GHz Pentium 4 processor apologize.

This, the end of Moore's Law, CPU development into the bottleneck. Until one day a forehead flash hardware engineer sounded the office door Craig Barrett. "The boss you do not kneel, I have a way to double the performance of the CPU."

Craig tears one word, that day, recalled, terrorist dominated by those guys ...... imprisoned in the cage of humiliation ......

image.png

Craig excited and asked: "What plan?"

Hardware Engineer: "Quite simply Yeah, now we just put two CPU loading to a large size CPU inside, then his performance is the performance of two CPU Yeah I'm really a little too smart boy!"

Craig made a lifetime of CPU, gas almost into the ICU. "I knelt Clemens even a lifetime, it will not do such a stupid thing."

image.png

The figure released by Intel 28-core CPU. Ok?

image.png


Of course, the above fact, some joking ingredients, but the result of the development of the CPU is indeed more to the number of cores to develop. From single-core to dual-core and then 6-core, 8-core non-stop growth of cores, CPU performance indeed follow growth. This fact, with our software engineers commonly distributed architecture like when stand-alone performance reached a bottleneck, can no longer improve system performance by increasing the vertical load of the server, only by the stand-alone system, split into multiple distributed services extend transverse direction.

By increasing the number of CPU cores, and hardware engineers seemingly successful completion of the task entrusted to him era. The results of a large pot on his head thrown our software engineers.

Come, we look back, we said above CPU, memory, IO that they have a core contradiction, this contradiction is the difference in speed. And this discrepancy remains unresolved. But we disguise solved. What is the solution? Hardware engineers in the core CPU of a place in the draw as a cache, the cache balanced by the differences between them. The software engineer it, in order to increase the maximum utilization of the CPU, and engage in something called a thread, a satisfactory solution to the problem by switching between multiple threads.

Ah, this program is perfect, no problem. However, the premise is running in single-core CPU.

We have just said core CPU, there will be a place to cache data loaded from memory, so you do not always loaded from memory and improve efficiency. But then, there is a cache single core, multi-core will be multiple caches, plus we run multi-threaded, what happens then? Here we have the real code examples:

public class TestCount {
    private int count = 0;

    public static void main(String[] args) throws InterruptedException {
        TestCount testCount = new TestCount();
        Thread threadOne = new Thread(() -> testCount.add());
        Thread threadTwo = new Thread(() -> testCount.add());
        threadOne.start();
        threadTwo.start();

        threadOne.join();
        threadTwo.join();

        System.out.println(testCount.count);
    }

    public void add() {
        for (int i = 0; i < 100000; i++) {
            count++;
        }
    }
}
复制代码


The code is simple, add two threads call a method, and the method of operation of this add is 10 w cycles, each time these two threads share count variable is incremented. According to our Intuitively, count start is 0, each thread plus 10 w, a total of two threads, so 10 w * 2 = 20 w.

But it? The result is not what we want it, the result is that I run: 113,595. And the results of each run is different, you can try. The results are substantially between 10w ~ 20w, and tends to infinity 10w.

What is a ghost? Remember we said earlier CPU cache? Yes, he is only a ghost. For ease of explanation, I drew a few pictures.

image.png

The figure is in the case of single-core, the first count will be loaded into memory. Then he was the initial value 0. Then, as shown in FIG, step 1, he was loaded into the CPU's cache, the processor CPU he taken out from the cache, then add operation, then After completion placed in the cache, then the cache memory write count in the end we get the results. Visible single nuclear case, because the shared cache and memory, without any problems, we then look at the case of multiple cores.

image.png

As the operation process is a multi-core scenarios, the following steps:

  1. First count is loaded into memory, CPU 1 is followed by 1 thread calls, the memory count = 0 is loaded into the cache
  2. Then the CPU 1 stores cache count = 0 loaded into the processor, after a time slice processing 13595
  3. The count = 13595 CPU into the cache, and then prepare for the next count
  4. The count = 13595 cache memory refresh added, wait for the next time slice reload
  5. Thread 2 CPU2 time slices obtained from the memory just forget half of the thread 1 count = 13595 loaded into the cache
  6. CPU 2 put count = 13595 loaded into the processor, it begins operation. While those with the CPU 1 and the time slice assigned to a thread, the thread then just count = 13595 operation, soon to give complete operator 10 w, and the results of the final brush into the memory, the data memory is now count = 10w .
  7. Thread 2 will soon run over 10w times, and now he gets the result of 13595 + 10w = 113595. The same then the results into the final refresh the memory, the data in memory now count = 113595.

You see the problem yet? Understandable cache count is in memory of a copy count. After does not change the value of the memory when modified in the cache, but after some time back to refresh the memory, and the thread half a calculated value into the refresh memory, thread two new value is loaded into the CPU2, the then calculate. And while some CPU 1 finishes the calculation, and the value into the refresh memory, CPU2 is still calculated, because he did not know the value of CPU1 changed, the calculation is over, their own values are also calculated into the refresh memory, so put the results CPU1 just busy half-day covers.

The root cause of this problem is, CPU 1 and the respective operating CPU 2 for both sides not visible. In this case, in fact, during the operation a total of three count variable is a memory count, a count is a copy of CPU1, CPU2 copy of the final count is in.

in conclusion

Hardware engineers balanced the difference in speed between the CPU and cache, specifically added the CPU cache, even in multi-core scene accidentally became the root of the problem in concurrent visibility!

other

This article is a "concurrent those things," the third in the former two as follows:

  1. Three ways [of those things] of concurrent threads created
  2. [Those things] Concurrent producers and consumers











Guess you like

Origin juejin.im/post/5ddbcfda6fb9a07ab87387af