pyDatalog: python的逻辑编程引擎【三:基础教程(下)】

聚合函数

聚合函数是一种特殊类型的函数【与数组类元素相关】。我们首先创建说明所需的数据。

In [1]:
from pyDatalog import pyDatalog
pyDatalog.create_terms('X,Y,manager, count_of_direct_reports')
# the manager of Mary is John
+(manager['Mary'] == 'John')
+(manager['Sam']  == 'Mary')
+(manager['Tom']  == 'Mary')

最基本的聚合函数是len_(),它计算数组的规模

In [2]:
(count_of_direct_reports[X]==len_(Y)) <= (manager[Y]==X)
print(count_of_direct_reports['Mary']==Y)
Y
-
2

pyDatalog 寻找 manager['Mary']==Y的所有可能解Y, 然后计算Y的数目。 .

聚合函数包括:

len_ (P[X]==len_(Y)) <= body :P[X]是Y的值的计数(通过子句的主体与X关联)

sum_ (P[X]==sum_(Y, for_each=Z)) <= body :P[X]是每个Z中的Y的总和。(Z用于区分可能相同的Y值)

min_, max_ (P[X]==min_(Y, order_by=Z)) <= body :P[X]是按Z排序的Y的最小值(或最大值)。

tuple_ (P[X]==tuple_(Y, order_by=Z)) <= body :P[X]是一个元组,包含按Z排序的Y的所有值。

concat_ (P[X]==concat_(Y, orde\r_by=Z, sep=',')) <= body :与'sum'相同,但是用于字符串。字符串按Z排序,并用','分隔。

rank_ (P[X]==rank_(grou\p_by=Y, order_by=Z)) <= body :P[X]是列表按Z排序时的Y值列表中的X的序列号。

running_sum_ (P[X]==running_sum_(N, group_by=Y, order_by=Z)) <= body :P[X]是当Y按Z排序时每个在之前X或等于X的N值的总和 。 mean_和 linear_regression:请参阅我们的参考( https://sites.google.com/site/pydatalog/reference )

(对这块不是很理解,官网例子也不多,可能写得不是很清楚。)

字面值和集合

就像pyDatalog函数的行为与Python中的字典一样,pyDatalog字面值的行为与Python中的集合很相似。

In [3]:
from pyDatalog import pyDatalog
pyDatalog.create_terms('X,Y,Z, works_in, department_size, manager, indirect_manager, count_of_indirect_reports')

向集合中添加事实的方法:

In [4]:
# Mary works in Production
+ works_in('Mary', 'Production')
+ works_in('Sam',  'Marketing')

+ works_in('John', 'Production')
+ works_in('John', 'Marketing')

同样,字面值也可以按值查询,比在原生python中的操作简洁【隐藏了一个循环】。

In [5]:
# give me all the X that work in Marketing
print(works_in(X,  'Marketing'))
# procedural equivalent in Python
# for i in _works_in:
#     if i[1]=='Marketing':
#         print i[0]
X   
----
Sam 
John

字面值也可以通过子句来定义。

【从这里就可以看到,“字面值”这个概念与谓词逻辑的形式是十分相似的】

In [6]:
# one of the indirect manager of X is Y, if the (direct) manager of X is Y
indirect_manager(X,Y) <= (manager[X] == Y)
# another indirect manager of X is Y, if there is a Z so that the manager of X is Z, 
#   and an indirect manager of Z is Y
indirect_manager(X,Y) <= (manager[X] == Z) & indirect_manager(Z,Y)
print(indirect_manager('Sam',X))
X   
----
Mary
John

请注意,这里使用了2个独立的子句实现了隐式的“或”。

【自己总结了“字面值”和上一篇中的“函数”的区别与联系: 1.前者使用圆括号,没有值(也就不能用==);后者使用方括号,对于括号中的元素有一个取值。 2.在用法上,manager[X] == Y 与 manager(X,Y)也是相似的, 但是按关键字查询中,manager['Mary'] == X 与 manager('Mary',X)相比会更高效。前者是一个哈希操作,而后者依然需要循环。】

当解析查询时,pyDatalog可以记得中间结果,通过这个过程被称为记忆化。这使查询更快,而且它也有助于处理无限循环!

In [7]:
# the manager of John is Mary (whose manager is John !)
manager['John'] = 'Mary'
manager['Mary'] = 'John'
print(indirect_manager('John',X))       # no infinite loop
X   
----
John
Mary

这使pyDatalog成为在复杂数据结构上实现递归算法的一个很好的工具,例如表示网络。

也可以删除事实:

In [8]:
# John does not work in Production anymore
- works_in('John', 'Production')
# 补充:
# 【也可以用增减事实(fact)同样的方式增减定理,但注意加上括号,加定理没有加号,但是减定理需要减号】
# - (indirect_manager(X,Y) <= (manager[X] == Z) & indirect_manager(Z,Y))
# print(indirect_manager('John',X))

聚合函数也可以在字面值上定义:

In [9]:
(count_of_indirect_reports[X]==len_(Y)) <= indirect_manager(Y,X)
print(count_of_indirect_reports['John']==Y)             
Y
-
4
In [10]:
# 自己写了一个小推理规则
pyDatalog.create_terms('X,Y,Z,father,fatherOf,grandfatherOf')
(grandfatherOf[X] == Z) <= ((fatherOf[X]==Y) & (fatherOf[Y]==Z))
fatherOf["乾隆"] = "雍正"
fatherOf["雍正"] = "康熙"
print(grandfatherOf["乾隆"] == X)
X 
--
康熙

树,图,与递归算法

树和图可以用它们的结点之间的连接定义:

In [11]:
pyDatalog.create_terms('link, can_reach')

# there is a link between node 1 and node 2
+link(1,2)
+link(2,3)
+link(2,4)
+link(2,5)
+link(5,6)
+link(6,7)
+link(7,2)

# 无向图,边双向连接
link(X,Y) <= link(Y,X)
Out[11]:
link(X,Y) <= link(Y,X)

下面两个子句解释了如何确定两个结点X,Y之间的可达关系:

In [12]:
# can Y be reached from X ?
can_reach(X,Y) <= link(X,Y) # direct link
# via Z
can_reach(X,Y) <= link(X,Z) & can_reach(Z,Y) & (X!=Y)

print (can_reach(1,Y))
Y
-
2
6
7
3
4
5

请注意,尽管图中有循环,但pyDatalog足够聪明以解决查询问题。

这个例子( https://github.com/pcarbonn/pyDatalog/blob/master/pyDatalog/examples/graph.py ) 中有更多的图算法的例子。

8皇后问题

通过结合我们迄今为止所学的,我们可以用声明式编程处理复杂问题,并让计算机找到解决它们的过程。 作为一个例子,让我们为8皇后问题编程找到一个有效的解决方案 。任何N皇后问题的解决方案可以在这里找到( https://github.com/pcarbonn/pyDatalog/blob/master/pyDatalog/examples/queens_N.py ) 。

In [13]:
from pyDatalog import pyDatalog
pyDatalog.create_terms('N,X0,X1,X2,X3,X4,X5,X6,X7')
pyDatalog.create_terms('ok,queens,next_queen')

# the queen in the first column can be in any row
queens(X0)                      <= (X0._in(range(8)))

# to find the queens in the first 2 columns, find the first one first, then find a second one
queens(X0,X1)                   <= queens(X0)                   & next_queen(X0,X1)

# repeat for the following queens
queens(X0,X1,X2)                <= queens(X0,X1)                & next_queen(X0,X1,X2)
queens(X0,X1,X2,X3)             <= queens(X0,X1,X2)             & next_queen(X0,X1,X2,X3)
queens(X0,X1,X2,X3,X4)          <= queens(X0,X1,X2,X3)          & next_queen(X0,X1,X2,X3,X4)
queens(X0,X1,X2,X3,X4,X5)       <= queens(X0,X1,X2,X3,X4)       & next_queen(X0,X1,X2,X3,X4,X5)
queens(X0,X1,X2,X3,X4,X5,X6)    <= queens(X0,X1,X2,X3,X4,X5)    & next_queen(X0,X1,X2,X3,X4,X5,X6)
queens(X0,X1,X2,X3,X4,X5,X6,X7) <= queens(X0,X1,X2,X3,X4,X5,X6) & next_queen(X0,X1,X2,X3,X4,X5,X6,X7)

# the second queen can be in any row, provided it is compatible with the first one
next_queen(X0,X1)                   <= queens(X1)                       & ok(X0,1,X1)

# to find the third queen, first find a queen compatible with the second one, then with the first
# re-use the previous clause for maximum speed, thanks to memoization
next_queen(X0,X1,X2)                <= next_queen(X1,X2)                & ok(X0,2,X2)

# repeat for all queens
next_queen(X0,X1,X2,X3)             <= next_queen(X1,X2,X3)             & ok(X0,3,X3)
next_queen(X0,X1,X2,X3,X4)          <= next_queen(X1,X2,X3,X4)          & ok(X0,4,X4)
next_queen(X0,X1,X2,X3,X4,X5)       <= next_queen(X1,X2,X3,X4,X5)       & ok(X0,5,X5)
next_queen(X0,X1,X2,X3,X4,X5,X6)    <= next_queen(X1,X2,X3,X4,X5,X6)    & ok(X0,6,X6)
next_queen(X0,X1,X2,X3,X4,X5,X6,X7) <= next_queen(X1,X2,X3,X4,X5,X6,X7) & ok(X0,7,X7)

# it's ok to have one queen in row X1 and another in row X2 if they are separated by N columns
ok(X1, N, X2) <= (X1 != X2) & (X1 != X2+N) & (X1 != X2-N)

# give me one solution to the 8-queen puzzle
print(queens(X0,X1,X2,X3,X4,X5,X6,X7).data[0])
(7, 3, 0, 2, 5, 1, 6, 4)

系列链接:

pyDatalog: python的逻辑编程引擎(用于推理、查询等)【一:序言】

pyDatalog: python的逻辑编程引擎【二:基础教程(上)】

pyDatalog: python的逻辑编程引擎【三:基础教程(下)】

pyDatalog: python的逻辑编程引擎【四:从文件中加载和执行程序】

pyDatalog: python的逻辑编程引擎【五:与“知识图谱”的交互】


猜你喜欢

转载自blog.csdn.net/blmoistawinde/article/details/80872078