一个开源分布式博客blog_xtg的分析

原文地址：https://github.com/xtg20121013/blog_xtg

阅读知识点准备：

tornadis用法

concurrent.futures.ThreadPoolExecutor任务用法

orm的用法-sqlalchemy的用法

apscheduler的TornadoScheduler的用法

main.py里面依次做了三件事：

tornado的application的加入

redis订阅加入

TornadoScheduler的加入

这三个异步任务都加入到tornado的loop循环中了

TornadoScheduler 负责刷新所有缓存，从数据库重建缓存并通知其他节点，用于定时任务校准缓存，通过yield tornadis.call来完成任务的；返回方式是raise tornado.gen.Return(reply)

查询数据的时候还涉及了其他方式的使用异步blog_info = yield thread_do(BlogInfoService.get_blog_info, db)，这个和路由里面的handler的处理方法是一样的（tornado的application），例如pager = yield self.async_do(ArticleService.page_articles, self.db, pager, article_search_params)

thread_do和async_do实际上都是concurrent.futures.ThreadPoolExecutor.submit，

这个线程池做的操作基本上是sqlalchemy的操作，比如add_comment这种纯存保存到数据库的操作：

comment_to_add = Comment(content=comment['content'], author_name=comment['author_name'],
                         author_email=comment['author_email'], article_id=article_id,
                         comment_type=comment['comment_type'], rank=comment['rank'], floor=floor,
                         reply_to_id=comment['reply_to_id'], reply_to_floor=comment['reply_to_floor'])
db_session.add(comment_to_add)
db_session.commit()

，或者query = db_session.query(Article)这种从数据库里面查询数据返回对象的操作，这两个都是同步操作

但是这个submit和coroutine交叉使用就让人迷糊了，怎么做到的呢。

class ThreadPoolExecutor(_base.Executor):
def submit(self, fn, *args, **kwargs):
    with self._shutdown_lock:
        if self._shutdown:
            raise RuntimeError('cannot schedule new futures after shutdown')

        f = _base.Future()
        w = _WorkItem(f, fn, args, kwargs)

        self._work_queue.put(w)
        self._adjust_thread_count()
        return f

猜测submit操作会交出原来的协程的使用权，等到子线程把这种耗时的操作完成之后，来通知原来线程的事件循环，原来协程完成了，可以往下运行了.

继续分析：submit及时返回一个future可以让事件循环知道，但是查询状态是未完成的，就先调度其他的协程；future变量可以跨线程的，所以一旦子线程完成是可以修改结果和状态（done），事件循环运行到这个协程再通过get_result传递实际的值

这里面有个重要的概念，result = yield future，左边并不是直接得到的是future，而是先不断向上传出生成器，直到传到事件循环，事件循环检查future对象状态，通过send future的结果即f.result（）不断向下直到等号右边变量：

loop——yield——>f3——yield——>f2——yield——>f1——yield——>future(yield 只负责传出生成器，再次调用生成器f3.send（None），f1.send（something），something才被传递到等号左边的result。

python2.7协程写法：

from concurrent.futures import ThreadPoolExecutor
import tornado.ioloop
from tornado.gen import  coroutine,Return
thread_pool = ThreadPoolExecutor(10)

def io_detect(server, _now, cycle):
    time.sleep(3)
    return 3

@coroutine
def call_blocking(server):
    cycle = 1
    _now = time.time()
    record = yield thread_pool.submit(io_detect, server, _now, cycle)
    raise Return(record)

@coroutine
def work():
    while 1:
        server_list = get_server()
        start0 = time.time()
        res = []
        for future in thread_pool.map(call_blocking, server_list):
            record = yield future
            if record:#如果ssh连接不上record返回是空
                res.append(record)


        insert_io(res)#数据库浩时0.10到0.15
        end = time.time()
        print "work:" + str(end - start0)

loop = tornado.ioloop.IOLoop.current()
loop.add_callback(work)#work有参数则写在work，后面
loop.start()

执行了work的协程，因为协程函数里面有多次子协程，想要他们并发执行所以用到map了；等价于：

for future in  [call_blocking(server) for server in server_list]:
    record = yield future

而这样写，则不会并发：

for server in  server_list:
    record = yield call_blocking(server)

因为yield真正完成之后，for循环才能启动下一个协程；而上面的两种方法协程可以开始工作了（加入先城池队列），并发更好。

同步并行程序写法：

set(dict)会转化成futures，

as_completed会循环检查future，直到有完成的future才会生成出来


# coding:utf-8
from concurrent import futures
import urllib2

URLS = ['http://www.xiaorui.cc/',
        'http://blog.xiaorui.cc/',
        'http://ops.xiaorui.cc/',
        'http://www.sohu.com/']


def load_url(url, timeout):
    print '收到任务{0}'.format(url)
    return urllib2.urlopen(url, timeout=timeout).read()


with futures.ThreadPoolExecutor(max_workers=5) as executor:
    future_to_url = dict((executor.submit(load_url, url, 60), url)
                         for url in URLS)

    for future in futures.as_completed(future_to_url):
        url = future_to_url[future]
        if future.exception() is not None:
            print('%r generated an exception: %s' % (url,
                                                     future.exception()))
        else:
            print('%r page is %d bytes' % (url, len(future.result())))