deepwalk源码解读与重构(1)

1.基础拓展

1.1.一篇文章搞懂Python中的面向对象编程

绝大多数源码会使用面向对象编程的方式，为此这里提供一份面向对象编程的快速指南
点击这里

1.2.查看Python源码的技巧

在Python的源码中，我们看到好多的pass，却不知道真正的函数体或方法体写在了哪里，那么看完Python的多继承，将完全解决这个问题！

(1)类里面就一个pass

点击这里

(2)类成员函数就一个pass

如图，我们在研究collection模块中的defaultdict类时，发现里面的很多成员函数全是pass：
这里写图片描述

如果你往上翻看该文件的开头，会发现这样的文件说明：
这里写图片描述

你看到的只是用来生成文档和给静态分析工具看的假代码，这些函数的真正实现在解释器里，一般看不到源代码。Python的解释器有c/java/python等多种实现，一般情况下只要不搞些奇怪的操作，c语言实现的CPython就能满足几乎所有要求，这也是官方提供的默认实现。作为解释器标准实现的一部分，那些个内置函数也是用c语言实现的，也就是说，正常情况下你是连pass也看不到的。

PyCharm这个IDE会维护一个对当前解释器中所有函数类型等东西的索引，这样就可以进行定义跳转一类的操作了。但是对内置函数来说，找不到对应的实现，只有文档 pydoc 可用，于是PyCharm就根据文档自动地生成这些函数的签名，也就是内容为pass的函数。这些函数的具体实现需要到python的源代码中找，可以参考官方文档extending python with c(差不多是这个名字)，看看c写的东西是怎么被python使用的。

请看下面的一个示例：

#!/usr/bin/env python
# -*- coding: utf-8 -*-


from collections import defaultdict

class Graph(defaultdict):
    def __init__(self):
        super(Graph, self).__init__(list)

这是一个我们自己写的文件，其中Graph继承了defaultdic类，在Graph里面我们的__init__函数使用了其父类的初始化函数，为此我们希望跳到父类里面看看：


# encoding: utf-8
# module _collections
# from (built-in)
# by generator 1.145
"""
High performance data structures.
- deque:        ordered collection accessible from endpoints only
- defaultdict:  dict subclass with a default value factory
"""
# no imports


class defaultdict(dict):
    def __init__(self, default_factory=None, **kwargs): 
        pass

你会发现父类的init函数仍然是pass，但这并不是说它时空代码，实际上这个父类所在的文件叫做_collections.py的文件，是一个内置文件，这部分的代码是一个假代码，实际的实现是用c语言写的。

2.Graph类源码分析

Graph类源码继承自collections.defaultdic，这个类则是继承了python内置的dict类型。关于其具体说明，稍后介绍。

Graph代码框架为：

"""
这是博主做分析用的代码框架不是实际的代码！！
"""

from collections import defaultdict

class Graph(defaultdict):
    def __init__(self):
        pass
    def nodes(self):
        pass

    def adjacency_iter(self):
        return self.iteritems()

    def subgraph(self, nodes={}):
        subgraph = Graph()

        for n in nodes:
            if n in self:
                subgraph[n] = [x for x in self[n] if x in nodes]

        return subgraph

    def make_undirected(self):

        t0 = time()

        for v in self.keys():
            for other in self[v]:
                if v != other:
                    self[other].append(v)

        t1 = time()
        logger.info('make_directed: added missing edges {}s'.format(t1 - t0))

        self.make_consistent()
        return self

    def make_consistent(self):
        t0 = time()
        for k in iterkeys(self):
            self[k] = list(sorted(set(self[k])))

        t1 = time()
        logger.info('make_consistent: made consistent in {}s'.format(t1 - t0))

        self.remove_self_loops()

        return self

    def remove_self_loops(self):

        removed = 0
        t0 = time()

        for x in self:
            if x in self[x]:
                self[x].remove(x)
                removed += 1

        t1 = time()

        logger.info('remove_self_loops: removed {} loops in {}s'.format(removed, (t1 - t0)))
        return self

    def check_self_loops(self):
        for x in self:
            for y in self[x]:
                if x == y:
                    return True

        return False

    def has_edge(self, v1, v2):
        if v2 in self[v1] or v1 in self[v2]:
            return True
        return False

    def degree(self, nodes=None):
        if isinstance(nodes, Iterable):
            return {v: len(self[v]) for v in nodes}
        else:
            return len(self[nodes])

    def order(self):
        "Returns the number of nodes in the graph"
        return len(self)

    def number_of_edges(self):
        "Returns the number of nodes in the graph"
        return sum([self.degree(x) for x in self.keys()]) / 2

    def number_of_nodes(self):
        "Returns the number of nodes in the graph"
        return self.order()  #

    def random_walk(self, path_length, alpha=0, rand=random.Random(), start=None):
        """ Returns a truncated random walk.

            path_length: Length of the random walk.
            alpha: probability of restarts.
            start: the start node of the random walk.
        """
        G = self
        if start:
            path = [start]
        else:
            # Sampling is uniform w.r.t V, and not w.r.t E
            path = [rand.choice(list(G.keys()))]

        while len(path) < path_length:
            cur = path[-1]
            if len(G[cur]) > 0:
                if rand.random() >= alpha:
                    path.append(rand.choice(G[cur]))
                else:
                    path.append(path[0])
            else:
                break
        return [str(node) for node in path]