Python 核心编程 3 -- 多线程编程

原文笔记：

多线程编程这种并行处理方式可以显著提高整个任务的性能。
串行程序必须使用非阻塞I/O或拥有计时器的阻塞I/O，以保证阻塞只是暂时的。
由于串行程序只有唯一的执行线程，因此它必须兼顾需要执行的多个任务，确保其中的某个任务不会占用过多时间，并对用户的响应时间进行合理的分配。这种任务类型的串行程序的使用，往往造成非常复杂的控制流，难以理解和维护。
线程包括开始、执行程序和结束三部分。它有一个指令指针，用于记录当前运行的上下文。当其他线程运行时，可以被抢占（中断）和临时挂起（睡眠）–这种做法叫做让步（yielding）。
一个进程中的各个线程与主线程共享同一片数据空间，因此相比于独立的线程而言，线程间的信息共享和通信更加容易
线程无法给予公平的执行时间。这是因为一些函数会在完成前保持阻塞状态，如果没有专门为多线程情况进行修改，会导致CPU的时间分配向这些贪婪的函数倾斜。
I/O密集型的Python程序要比计算密集型的代码能够更好地利用多线程环境。
Python提供了多个模块来支持多线程编程，包括thread、threading、Queue模块等。thread提供了基本的线程和锁定支持；而threading模块提供了更高级别、功能更全面地线程管理。使用Queue模块，用户可以创建一个队列数据结构，用于在多线程之间进行共享。

一些提示：

使用更高级别地threading模块，而不使用thread模块有很多原因。threading模块更加先进，有更好地线程支持，并且thread模块中的一些属性会和threading模块有冲突。另一个原因是低级别的thread模块拥有的同步原语很少，而threading模块则有很多。
避免使用thread模块的另一个原因是他对于进程何时退出没有控制。当主线程结束时，所有其他线程也都强制结束，不会发出警告或者进行适当的清理。而threading模块能确保重要的子线程在进程退出前结束。
在Python3中 thread模块被重命名为_thread。
避免使用thread模块的另一个原因是该模块不支持守护线程这个概念。当主线程退出时，所有子线程都将终止，不管他们是否在仍在工作。如果你不希望这种行为发生，就要引入守护线程概念。其工作方式是：守护线程一般是一个等待客户端请求服务的服务器。如果没有客户端请求，守护线程就是空闲的。如果把一个县城设置为守护线程，就表示这个线程是不重要的，进程退出时不需要等待着线程执行完成。如果主线程退出时，不需要等待某些子线程完成，就可以为这些子线程设置守护线程标记。该标记为真时，表示该线程是不重要的。
要将一个线程设置为守护线程，需要在启动线程之前执行如下赋值语句：thread.daemon = True。同样，要检查线程的守护状态，也只需要检查这个值即可。

threading 模块：

1. 使用threading模块

import threading
from time import sleep, ctime

loops = [4, 2]

def loop(nloop, nsec):
    print('start loop', nloop, 'at:', ctime())
    sleep(nsec)
    print('loop', nloop, 'done at:', ctime())

def main():
    print('starting at:', ctime())
    threads = []
    nloops = range(len(loops))

    for i in nloops:
        t = threading.Thread(target=loop, args=(i, loops[i]))  # 实例化 Thread() 新线程不会立即执行。
        threads.append(t)

    for i in nloops:
        threads[i].start()  # 当所有线程分配完成之后，调用start()方法开始执行，而不是在这之前就开始执行

    for i in nloops:
        threads[i].join()  # join() 方法将等待线程结束，或者在提供了超时时间的情况下，达到超时时间。

    print('all done at:', ctime())

if __name__ == '__main__':
    main()

2.使用可调用的类

'''
本例中，将传递进去一个可以调用的类（实例）而不仅仅是一个函数。这个实现中提供了更加面向对象的方法。
'''
import threading
from time import sleep, ctime
loops = [4, 2]
class ThreadFunc(object):

    def __init__(self, func, args, name = ''):
        self.name = name
        self.func = func
        self.args = args

    def __call__(self):
        self.func(*self.args)

def loop(nloop, nsec):

    print('start loop', nloop, 'at:', ctime())
    sleep(nsec)
    print('loop', nloop, 'done at:', ctime())

def main():
    print('starting at:', ctime())
    threads = []
    nloops = range(len(loops))

    for i in nloops:
        t = threading.Thread(target=ThreadFunc(loop, (i, loops[i]), loop.__name__))  # 实例化 Thread() 新线程不会立即执行。
        threads.append(t)

    for i in nloops:
        threads[i].start()

    for i in nloops:
        threads[i].join()

    print('all done at:', ctime())

if __name__ == '__main__':
    main()

3. 子类化的Thread

'''
本例中将对Thread子类化，而不是直接对其实例化。
'''
import threading
from time import sleep, ctime
loops = [4, 2]

class MyThread(threading.Thread):

    def __init__(self, func, args, name = ''):
        threading.Thread.__init__(self)
        self.name = name
        self.func = func
        self.args = args

    def __call__(self):
        self.func(*self.args)

def loop(nloop, nsec):

    print('start loop', nloop, 'at:', ctime())
    sleep(nsec)
    print('loop', nloop, 'done at:', ctime())

def main():
    print('starting at:', ctime())
    threads = []
    nloops = range(len(loops))

    for i in nloops:
        t = MyThread(loop, (i, loops[i]), loop.__name__)
        threads.append(t)

    for i in nloops:
        threads[i].start()

    for i in nloops:
        threads[i].join()

    print('all done at:', ctime())

if __name__ == '__main__':
    main()

4. Thread 子类 MyThread

'''
为了让Thread子类更加通用，将子类移到一个专门的模块中，并添加了可以调用的getResult() 方法来获取返回值。
'''
import threading
from time import sleep, ctime
loops = [4, 2]

class MyThread(threading.Thread):

    def __init__(self, func, args, name = ''):
        threading.Thread.__init__(self)
        self.name = name
        self.func = func
        self.args = args

    def __call__(self):
        self.func(*self.args)

    def getResult(self):
        return self.res

    def run(self):
        print('starting', self.name, 'at:', ctime())
        self.res = self.func(*self.args)
        print(self.name, 'finished at:', ctime())

例子

本例是图书排名示例，原属代码并不能直接运行，原因是urllib的问题。经过修改代码如下，但依旧不能像源代码功能一样，会返回评分，因为return语句依旧报错。
报错1：HTTPError: HTTP Error 503: Service Unavailable on valid website (已解决)
报错2：return REGEX.findall(data)[0] TypeError: expected string or bytes-like object (未解决，欢迎提供解决方案，源代码基于正则表达式检索评分)

from atexit import register
from re import compile
from threading import Thread
from time import ctime
from urllib.request import urlopen
import urllib.request

REGEX = compile('#([\d,]+) in Books ')
AMZN = 'https://amazon.com/dp/'
ISBNs = {
    '0132269937': 'Core Python Programing',
    '0132356139': 'Python Web Development with Django',
    '0137143419': 'Python Fundamentals'
}

def getRanking(isbn):
    page = urllib.request.Request(url= AMZN + isbn, data=b'None', headers={
        'User-Agent': 'Mozilla / 5.0(Windows NT 10.0; Win64; x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 69.0.3497.92 Safari / 537.36'})
    data = urlopen(page)
    # return REGEX.findall(data)[0]

def _showRanking(isbn):
    print('- {0} ranked {1}'.format(ISBNs[isbn], getRanking(isbn)))

def _main():
    print('At', ctime(), 'on Amazon...')
    for isbn in ISBNs:
        Thread(target=_showRanking, args=(isbn,)).start()

@register
def _atexit():
    print('all DONE at:', ctime())

if __name__ == '__main__':
    _main()

结果为：

At Wed Oct 10 23:39:59 2018 on Amazon…

Python Web Development with Django ranked None
Python Fundamentals ranked None
Core Python Programing ranked None
all DONE at: Wed Oct 10 23:40:02 2018

Process finished with exit code 0