April 27th python learning summary GIL, process pool, thread pool, synchronization, asynchronous, blocking, non-blocking

1. GIL: Global Interpreter Lock　　

GIL is essentially a mutex lock, which is clamped on the interpreter.

All threads in the same process need to grab the GIL lock before executing the interpreter code

2. Advantages and disadvantages of GIL:

Advantages: Guaranteed thread safety of Cpython interpreter memory management

Disadvantages: All threads in the same process can only have one execution at a time, which means that the multithreading of the Cpython interpreter cannot achieve parallelism

2. GIL and multithreading　　

With the existence of the GIL, only one thread in the same process is executed at the same time

Hearing this, some students immediately asked: the process can use multi-core, but the overhead is high, while the multi-threading overhead of python is small, but it cannot take advantage of the multi-core advantage, which means that python is useless, and php is the most powerful language. ?

To solve this problem, we need to agree on several points:
#1. Is cpu used for computation or I/O?

#2. Multiple CPUs means that multiple cores can complete the calculation in parallel, so the multi-core improvement is the calculation performance

#3. Once each cpu encounters I/O blocking, it still needs to wait, so multi-core is useless for I/O operations 
A worker is equivalent to a CPU. At this time, the calculation is equivalent to the work of the worker, and the I/O blocking is equivalent to the process of providing the raw materials required for the worker to work. If there is no raw material during the worker's work, the worker is working. Need to stop until waiting for the arrival of raw materials.

If most of the tasks in your factory involve the process of preparing raw materials (I/O intensive), then no matter how many workers you have, it doesn't make much sense. to do other work,

Conversely, if your factory has complete raw materials, then of course, the more workers, the higher the efficiency.

in conclusion:

　　For computing, more CPUs are better, but for I/O, more CPUs are useless

　　Of course, for running a program, the execution efficiency will definitely improve with the increase of cpu (no matter how much the improvement is, it will always improve), this is because a program is basically not pure calculation or pure I/O , so we can only look at whether a program is computationally intensive or I/O intensive, so as to further analyze whether python's multithreading is useful or not.
#analyze:
We have four tasks to deal with. The way to deal with it must be to play a concurrent effect. The solution can be:
Option 1: Start four processes
Option 2: Open four threads under one process

#In the case of a single core, the analysis results:
　　If the four tasks are computationally intensive, and there is no multi-core for parallel computing, the first solution increases the cost of creating a process, and the second solution wins
　　If the four tasks are I/O-intensive, the cost of creating a process in the first solution is high, and the switching speed of the process is far less than that of the thread, and the second solution is better.

#In the case of multi-core, analysis results:
　　If the four tasks are computationally intensive, multi-core means parallel computing. In python, only one thread executes at the same time in a process and cannot use multi-core. The solution wins.
　　If the four tasks are I/O intensive, and no amount of cores can solve the I/O problem, the solution two wins

 
#Conclusion: Nowadays, computers are basically multi-core. The efficiency of multi-threading python for computationally intensive tasks does not bring much performance improvement, and it is not even as good as serial (without a lot of switching). However, for IO-intensive tasks The task efficiency has been significantly improved.
　　

Multithreaded performance test

 1 from multiprocessing import Process
 2 from threading import Thread
 3 import os,time
 4 def work():
 5     res=0
 6     for i in range(100000000):
 7         res*=i
 8 
 9 
10 if __name__ == '__main__':
11     l=[]
12     print(os.cpu_count()) #本机为4核
13     start=time.time()
14     for i in range(4):
15         p=Process(target=work) #耗时5s多
16         p=Thread(target=work) #耗时18s多
17         l.append(p)
18         p.start()
19     for p in l:
20         p.join()
21     stop=time.time()
22     print('run time is %s' %(stop-start))

Computationally intensive: multi-process is efficient

 1 from multiprocessing import Process
 2 from threading import Thread
 3 import threading
 4 import os,time
 5 def work():
 6     time.sleep(2)
 7     print('===>')
 8 
 9 if __name__ == '__main__':
10     l=[]
11     print(os.cpu_count()) #本机为4核
12     start= time.time()
 13      for i in range(400 ):
 14          # p=Process(target=work) #It takes more than 12s, most of the time is spent on creating a process 
15          p=Thread(target=work) # It takes more than 2s 
16          l.append(p)
 17          p.start()
 18      for p in l:
 19          p.join()
 20      stop= time.time()
 21      print ( ' run time is %s ' %(stop -start))

I/O intensive: multi-threading is efficient

3. Process pool and thread pool　　　　

Process pool vs thread pool

Why use "pool": The pool is used to limit the number of concurrent tasks and limit our computer to perform tasks concurrently within a range that we can afford

When to install processes in the pool: concurrent tasks are computationally intensive

When to install threads in the pool: concurrent tasks are IO-intensive

process pool　　

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time,os,random

def task(x):
    print('%s 接客' %os.getpid())
    time.sleep(random.randint(2,5))
    return x**2

if __name__ == '__main__':
    p =ProcessPoolExecutor() #The number of processes opened by default is the number of cpu cores

    #alex , Wu Peiqi, Yang Li, Wu Chenyu, Zhang San

    for i in range(20):
        p.submit(task,i)

Thread Pool

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time,os,random

def task(x):
     print ( ' %s pick up ' % x)
    time.sleep(random.randint(2,5))
    return x**2

if __name__ == '__main__':
    p =ThreadPoolExecutor(4) #The number of threads enabled by default is the number of cpu cores*5

    #alex , Wu Peiqi, Yang Li, Wu Chenyu, Zhang San

    for i in range(20):
        p.submit(task,i)

Four, synchronous, asynchronous, blocking, non-blocking

1. Blocking and non-blocking refer to the two operating states of the program

Blocking: Blocking occurs when IO is encountered. Once the program encounters a blocking operation, it will stop in place and release CPU resources immediately.

Non-blocking (ready state or running state): no IO operation is encountered, or by some means, the program will not stop in place even if it encounters an IO operation, perform other operations, and strive to occupy as much CPU as possible

2. Synchronous and asynchronous refer to two ways of submitting tasks:

Synchronous call: After submitting the task, just wait in place, and continue to execute the next line of code until the return value of the task is obtained after the task is completed.

Asynchronous call: After submitting the task, instead of waiting in place, execute the next line of code directly, the result?

Asynchronous call

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time,os,random

def task(x):
     print ( ' %s pick up ' % x)
    time.sleep(random.randint(1,3))
    return x**2

if  __name__ == ' __main__ ' :
     #Asynchronous call 
    p=ThreadPoolExecutor(4) #The number of threads enabled by default is the number of cpu cores*5

    # alex, Wu Peiqi, Yang Li, Wu Chenyu, Zhang 

    Sanobj_l = []
     for i in range(10 ):
        obj=p.submit(task,i)
        obj_l.append(obj)

    # p.close()
    # p.join()
    p.shutdown(wait=True)

    print (obj_l [3 ] .result ())
     print ( '主' )

Synchronous call

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time,os,random

def task(x):
     print ( ' %s pick up ' % x)
    time.sleep(random.randint(1,3))
    return x**2

if __name__ == '__main__':

    #Synchronous call 
    p=ThreadPoolExecutor(4) #The number of threads enabled by default is the number of cpu cores*5

    #alex , Wu Peiqi, Yang Li, Wu Chenyu, Zhang San

    for i in range(10):
        res=p.submit(task,i).result()

    print ( ' main ' )