Python学习笔记之进程与线程

文章目录

1. 多进程
2. 批量创建进程-Pool
3. 进程间通信
4. 多线程
5. 多线程数据安全修改-Lock
6. python中无法做到多线程并发的原因-GIL
7. ThreadLocal
8. 进程与线程的优缺点总结

1. 多进程

multiprocessing模块就是跨平台版本的多进程模块。

multiprocessing模块提供了一个Process类来代表一个进程对象，下面的例子演示了启动一个子进程并等待其结束：

from multiprocessing import Process
import os

# 子进程要执行的代码
def run_proc(name):
    print('Run child process %s (%s)...' % (name, os.getpid()))


print('Parent process %s.' % os.getpid())
p = Process(target=run_proc, args=('test',))
print('Child process will start.')
p.start()
p.join()
print('Child process end.')

Parent process 4032.
Child process will start.
Run child process test (14600)…
Child process end.
【总结】

【1】调用Process()新建一个进程，其中target是进程需要执行的代码，args=(name,)设置进程名。

【2】调用start()开启进程。

【3】调用join()等待进程执行完毕。

2. 批量创建进程-Pool

如果需要启动大量的子进程，可以使用进程池的方式：Pool来批量创建：

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print('Run task %s (%s)...' % (name, os.getpid()))
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print('Task %s runs %0.2f seconds.' % (name, (end - start)))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')

Parent process 14960.
Waiting for all subprocesses done…
Run task 0 (11872)…
Run task 1 (1460)…
Run task 2 (12424)…
Run task 3 (15260)…
Task 0 runs 0.50 seconds.
Run task 4 (11872)…
Task 2 runs 1.17 seconds.
Task 3 runs 1.57 seconds.
Task 4 runs 1.67 seconds.
Task 1 runs 2.25 seconds.
All subprocesses done.
【注意】

【1】对Pool对象调用join()方法会等待所有子进程执行完毕，调用join()之前必须先调用close()，调用close()之后就不能继续添加新的Process了。

【2】因为进程池大小我们设置的是4，但是子进程我们创建了5个，所以task0-3都是立即执行的，而task4需要等待一个子进程执行完毕后才能执行。

【3】Pool的默认大小是CPU的核数。

3. 进程间通信

Python的multiprocessing模块包装了底层的机制，提供了Queue、Pipes等多种方式来交换数据。

在父进程中创建两个子进程，一个往Queue里写数据，一个从Queue里读数据：

from multiprocessing import Process, Queue
import os, time, random

# 写数据进程执行的代码:
def write(q):
    print('Process to write: %s' % os.getpid())
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())

# 读数据进程执行的代码:
def read(q):
    print('Process to read: %s' % os.getpid())
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)

if __name__=='__main__':
    # 父进程创建Queue，并传给各个子进程：
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # 启动子进程pw，写入:
    pw.start()
    # 启动子进程pr，读取:
    pr.start()
    # 等待pw结束:
    pw.join()
    # pr进程里是死循环，无法等待其结束，只能强行终止:
    pr.terminate()

Process to write: 3660
Put A to queue…
Process to read: 14876
Get A from queue.
Put B to queue…
Get B from queue.
Put C to queue…
Get C from queue.
【总结】

【1】通过执行Queue()获取一个队列。

【2】通过put(value)向队列中写入数据。

【3】通过get(True)从队列中获取数据。

4. 多线程

Python的标准库提供了两个模块：_thread和threading，_thread是低级模块，threading是高级模块，对_thread进行了封装。绝大多数情况下，我们只需要使用threading这个高级模块。

启动一个线程就是把一个函数传入并创建Thread实例，然后调用start()开始执行：

import time, threading

# 新线程执行的代码:
def loop():
    print('thread %s is running...' % threading.current_thread().name)
    n = 0
    while n < 5:
        n = n + 1
        print('thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
    print('thread %s ended.' % threading.current_thread().name)

print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)

thread MainThread is running…
thread LoopThread is running…
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.
【总结】

【1】建立一个线程，首先调用threading.Thread()，其中target指定线程执行的代码，name指定线程的名称。

【2】调用start()开启一个线程。

【3】调用join()等待线程执行完毕。

【注意】

由于任何进程默认就会启动一个线程，我们把该线程称为主线程，主线程又可以启动新的线程，Python的threading模块有个current_thread()函数，它永远返回当前线程的实例。主线程实例的名字叫MainThread，子线程的名字在创建时指定，我们用LoopThread命名子线程。名字仅仅在打印时用来显示，完全没有其他意义，如果不起名字Python就自动给线程命名为Thread-1，Thread-2……

5. 多线程数据安全修改-Lock

多线程中，所有变量都由所有线程共享，所以，任何一个变量都可以被任何一个线程修改，因此，线程之间共享数据最大的危险在于多个线程同时改一个变量，把内容给改乱了。

如果我们要确保计算正确，就要给change_it()上一把锁，当某个线程开始执行change_it()时，该线程因为获得了锁，因此其他线程不能同时执行change_it()，只能等待，直到锁被释放后，获得该锁以后才能改。由于锁只有一个，无论多少线程，同一时刻最多只有一个线程持有该锁，所以，不会造成修改的冲突。创建一个锁就是通过threading.Lock()来实现：

import time, threading

# 假定这是你的银行存款:
balance = 0
lock = threading.Lock()

def change_it(n):
    # 先存后取，结果应该为0:
    global balance
    balance = balance + n
    balance = balance - n

def run_thread(n):
    for i in range(100000):
        # 先要获取锁:
        lock.acquire()
        try:
            # 放心地改吧:
            change_it(n)
        finally:
            # 改完了一定要释放锁:
            lock.release()

t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)

【注意】

【1】当多个线程同时执行lock.acquire()时，只有一个线程能成功地获取锁，然后继续执行代码，其他线程就继续等待直到获得锁为止。

【2】加锁的第一步是从threading.Lock()获取lock；然后在需要加锁的地方写上：lock.acquire()；代码执行完毕后执行lock.release()释放锁。

6. python中无法做到多线程并发的原因-GIL

Python解释器由于设计时有GIL(Global Interpreter Lock)全局锁，导致了多线程无法利用多核;任何Python线程执行前，必须先获得GIL锁，然后，每执行100条字节码，解释器就自动释放GIL锁，让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁，所以，多线程在Python中只能交替执行，即使100个线程跑在100核CPU上，也只能用到1个核。

多线程的并发在Python中就是一个美丽的梦。

7. ThreadLocal

如何将线程的局部变量，可以任意读写而互不干扰，threading中提供的local方法就解决了这个问题：

import threading

# 创建全局ThreadLocal对象:
local_school = threading.local()

def process_student():
    # 获取当前线程关联的student:
    std = local_school.student
    print('Hello, %s (in %s)' % (std, threading.current_thread().name))

def process_thread(name):
    # 绑定ThreadLocal的student:
    local_school.student = name
    process_student()

t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A')
t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()

Hello, Alice (in Thread-A)
Hello, Bob (in Thread-B)
【总结】

【1】全局变量local_school就是一个ThreadLocal对象，每个Thread对它都可以读写student属性，但互不影响。你可以把local_school看成全局变量，但每个属性如local_school.student都是线程的局部变量，可以任意读写而互不干扰，也不用管理锁的问题，ThreadLocal内部会处理。

【2】可以理解为全局变量local_school是一个dict，不但可以用local_school.student，还可以绑定其他变量，如local_school.teacher等等。

【3】ThreadLocal最常用的地方就是为每个线程绑定一个数据库连接，HTTP请求，用户身份信息等，这样一个线程的所有调用到的处理函数都可以非常方便地访问这些资源。

8. 进程与线程的优缺点总结

【进程】

优点： 稳定性高，因为一个子进程崩溃了，不会影响主进程和其他子进程。

缺点： 创建进程的代价大。

【线程】

优点： 在Windows下，多线程的效率比多进程要高。

缺点： 稳定性低；任何一个线程挂掉都可能直接造成整个进程崩溃，因为所有线程共享进程的内存。