Under python multithreading is tasteless, multi-process code samples recommended

Look at the recent multi-threaded Python, we often hear veteran said: "! Under python multithreading is tasteless, it is recommended to use multi-process," but why do you say?

To know these, to know why. So with the following in-depth study:

 

First emphasize the background:

1. What GIL yes? GIL stands for Global Interpreter Lock (Global Interpreter Lock), the source is considered the beginning of the python design, for data security decisions made.

2, at the same time each CPU can only execute one thread (multi-threading in single-core CPU is actually just concurrent, not parallel, concurrent and parallel from the macro to handle both in terms of the concept of multiple requests, but concurrent and parallel differentiated, refers to two or more parallel events occur at the same time; while concurrently refers to two or more events occur within the same time interval)


In Python multiple threads, each thread of execution mode:
1. Obtain GIL
2. code execution until sleep or python virtual machine to hang.
3. Release GIL
seen a thread wants to perform, you must first get GIL, we can GIL as a "pass", and in a python process, GIL only one. Can not get pass the thread was not allowed into the CPU.


In python2.x in, release of the GIL logical thread to meet the current IO operation count reaches 100 or ticks (ticks python itself can be seen as a counter, to specialize the GIL, zero after each release, can be counted by this sys.setcheckinterval adjusted), to be released.

And each release GIL lock, thread lock contention, switching threads, consumes resources. And because of the presence of GIL lock, python in a process can only ever be executed simultaneously a thread (GIL get the thread to execute), which is why on multi-core CPU, multithreading python efficiency is not high.

 

It is not multi-threaded python is completely useless it?

Here we discuss the classification:

1, CPU intensive code (various loop processing, counting, etc.), in this case, the count ticks will soon reach the threshold, and then triggers the release of further competitive GIL (multiple threads of course, need to consume toggle resources), so multiple threads in python is not friendly to CPU-intensive code.

2, IO intensive code (document processing, web crawler etc.), multi-threading can effectively improve the efficiency (IO operations will be IO have to wait for a single thread, resulting in unnecessary waste of time, and can open multiple threads waiting thread A automatically switching to the thread B, without wasting CPU resources, which can improve the efficiency of program execution). So python multi-threading more friendly for IO-intensive code.


In python3.x in, GIL does not use ticks count, instead using a timer (after the execution time threshold is reached, the current thread releases the GIL), so CPU-intensive programs more friendly, but still does not address the same time caused only GIL It can issue a thread of execution, so the efficiency is still unsatisfactory.

 

Multicore, multithreaded worse than single-core multi-threaded, multi-threaded Nucleation is a single reason, each release GIL, wake up that thread can acquire the GIL lock, can be seamlessly executed, but in multi-core, CPU0 released after GIL, on the other CPU thread will compete, but could immediately be GIL get CPU0, resulting in several other threads on the CPU will be awake waiting to be awakened after the switching time to be scheduled into the state, this will cause the thread thrashing (thrashing), resulting in lower efficiency

 

Back to the beginning of the problem: We often hear veteran said: "python next want to take full advantage of multi-core CPU, to use multi-process" because what is it?

The reason: each process has its own independent GIL, without disturbing each other, so that you can execute in parallel in the true sense, so in python, the efficiency is better than multi-process multi-thread (only for multi-core CPU terms).

 

So here say Conclusions: multi-core, parallel wanted to improve efficiency, more common approach is to use a multi-process, can effectively improve the efficiency

python multi-process code examples:

 

multiprocessing Import 
Import PANDAS AS pd 

DEF process_split_df (Row): 
# method of processing data 
     Pass 

# DataFrame this we create a process according to their needs 
df = pd.DataFrame () 
# Get the current cpu of the number of machines 
pool_size = multiprocessing.cpu_count ( ) 
# split just created DataFrame 
df_split = np.array_split (df, pool_size) 
# thrown into the pool of course 
the pool = multiprocessing.Pool (processes = pool_size) 
# multi-process started and merge together to process_split_df is a method to process each row of data 
result_df pd.concat = (pool.map (process_split_df, df_split)) 
pool.close () 
pool.join ()

  


---------------------
Disclaimer: This article is CSDN blogger "Yu L" of the original article, follow the CC 4.0 by-sa copyright agreements, please attach a reprint the original source link and this statement.
Original link: https: //blog.csdn.net/lambert310/article/details/50605748

Guess you like

Origin www.cnblogs.com/bianzhiwei/p/11359577.html