Tasteless multithreading in Python

Author: DarrenChan Chen Chi
Link : https://www.zhihu.com/question/23474039/answer/269526476
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Before introducing threads in Python, let's clarify a question. Multithreading in Python is fake multithreading ! Why do we say this, let's first clarify a concept, the global interpreter lock (GIL).

The execution of Python code is controlled by the Python virtual machine (interpreter). At the beginning of its design, Python considered that in the main loop, only one thread is executing at the same time, just like running multiple processes in a single-CPU system, multiple programs can be stored in the memory, but at any time, only one program is running on the CPU run in. Likewise, while the Python interpreter can run multiple threads, only one thread runs in the interpreter.

Access to the Python virtual machine is controlled by the Global Interpreter Lock (GIL), which ensures that only one thread is running at a time. In a multithreaded environment, the Python virtual machine executes as follows.

1. Set up the GIL.

2. Switch to a thread to execute.

3. Run.

4. Set the thread to sleep state.

5. Unlock the GIL.

6. Repeat the above steps again.

For all I/O-oriented programs (that call built-in operating system C code), the GIL is released before the I/O call to allow other threads to run while the thread is waiting for I/O. If a thread is not using a lot of I/O operations, it will keep using the processor and the GIL for its own time slice. That said, I/O-intensive Python programs can take advantage of multithreading more than computationally-intensive Python programs.

We all know that, for example, I have a 4-core CPU, so in this way, each core can only run one thread per unit time, and then the time slice is switched in rotation. But Python is different. It doesn't matter how many cores you have, multiple cores can only run one thread per unit of time, and then the time slice rotates. Looks incredible? But that's what the GIL does. Before any Python thread executes, it must first acquire the GIL lock, and then, every 100 bytes of code is executed, the interpreter automatically releases the GIL lock, giving other threads a chance to execute. This GIL global lock actually locks the execution code of all threads. Therefore, multithreading can only be executed alternately in Python. Even if 100 threads run on a 100-core CPU, only one core can be used. Usually the interpreter we use is the official implementation of CPython, to really take advantage of multi-core, unless you rewrite an interpreter without GIL.

Let's do an experiment:

#coding=utf-8
from multiprocessing import Pool
from threading import Thread

from multiprocessing import Process


def loop():
    while True:
        pass

if __name__ == '__main__':

    for i in range(3):
        t = Thread(target=loop)
        t.start()

    while True:
        pass

My computer has 4 cores, so I opened 4 threads and looked at the CPU resource occupancy:

We found that the CPU utilization was not full, roughly equivalent to a single core level.

And what if we become processes?

Let's change the code:

#coding=utf-8
from multiprocessing import Pool
from threading import Thread

from multiprocessing import Process


def loop():
    while True:
        pass

if __name__ == '__main__':

    for i in range(3):
        t = Process(target=loop)
        t.start()

    while True:
        pass

The result directly soared to 100%, indicating that the process can take advantage of multi-core!

To verify that this is the GIL in Python, I try to write the same code in Java, start the thread, let's observe:

package com.darrenchan.thread;

public class TestThread {
    public static void main(String[] args) {
        for (int i = 0; i < 3; i++) {
            new Thread(new Runnable() {

                @Override
                public void run() {
                    while (true) {

                    }
                }
            }).start();
        }
        while(true){

        }
    }
}

 

It can be seen that multi-threading in Java can take advantage of multi-core, which is true multi-threading! And multithreading in Python can only utilize a single core, which is fake multithreading!

Is that so? There is no way we can take advantage of multiple cores in Python? of course can! The multi-process just now is a solution, and the other is to call the link library of the C language. For all I/O-oriented programs (that call built-in operating system C code), the GIL is released before the I/O call to allow other threads to run while the thread is waiting for I/O. We can write some computationally intensive tasks in C language, and then load the contents of the .so link library into Python, because the C code is executed, the GIL lock will be released, so that each core can run a thread the goal of!

Some small partners may not understand what is a computationally intensive task and what is an I/O intensive task?

Computation-intensive tasks are characterized by a large number of calculations that consume CPU resources, such as calculating pi, decoding videos in high-definition, etc., all of which depend on the computing power of the CPU. Although this kind of computing-intensive task can also be completed by multitasking, the more tasks, the more time spent in task switching, and the lower the efficiency of the CPU to perform tasks. The number of simultaneous tasks should be equal to the number of CPU cores.

Computation-intensive tasks mainly consume CPU resources, so the efficiency of the code is very important. Scripting languages ​​like Python are inefficient and completely unsuitable for computationally intensive tasks. For computationally intensive tasks, it is best to write in C.

The second type of task is IO-intensive. Tasks involving network and disk IO are all IO-intensive tasks. This type of task is characterized by low CPU consumption and most of the time of the task is waiting for the IO operation to complete (because The speed of IO is much lower than the speed of CPU and memory). For IO-intensive tasks, the more tasks, the higher the CPU efficiency, but there is a limit. Most common tasks are IO-intensive tasks, such as web applications.

During the execution of IO-intensive tasks, 99% of the time is spent on IO, and very little time is spent on the CPU. Therefore, it is completely impossible to replace the extremely slow scripting language such as Python with the extremely fast C language. Improve operational efficiency. For IO-intensive tasks, the most suitable language is the language with the highest development efficiency (the least amount of code), the scripting language is the first choice, and the C language is the worst.

In summary, Python multi-threading is equivalent to single-core multi-threading. Multi-threading has two advantages: CPU parallelism, IO parallelism, and single-core multi-threading is equivalent to breaking an arm. So, in Python, you can use multithreading, but don't expect to make efficient use of multiple cores. If you must use multiple cores through multithreading, it can only be achieved through C extensions, but this will lose the simplicity and ease of use of Python. However, don't worry too much. Although Python cannot use multi-threading to achieve multi-core tasks, it can achieve multi-core tasks through multi-process. Multiple Python processes have their own GIL locks, which do not affect each other.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324695263&siteId=291194637