Link prediction
是根据同济子豪兄学长的中文讲解做的笔记哦,感兴趣的话可以直接去b站观看详细视频:
传送带:
https://github.com/TommyZihao/zihao_course/blob/main/CS224W/1-Intro.md
Link prediction
Objective:
通过已知连接去补全未知连接
The task is to predict new links based on the existing links.
总共有两种方法:
1.直接提取link的特征,变成D维向量
2.把link 两端节点的D维向量拼在一起 (丢失了中间的结构信息)
The key is to design feature (关键就是要保留特征)
Choose 1
Methodology:
For each pair of nodes (x,y) compute score c(x,y)
提取连接特征->D维向量->分数(x,y)
计算两节点 link 的score,降序排列,然后取 top k 个作为预测结果
Distance-based feature
看量不看质
Local neighborhood overlap
Tips:但是当两个节点没有共同好友,就要用全图连接特征 Katz index
Global neighborhood overlap
Katz index: count the number of walks of all lengths between a given pair of nodes.
(邻接矩阵的 k 次幂,得到u 和 v 之间长度为 k 的路径个数,由于 k 太大的话意义不大,所以再乘以一个衰减系数 b,通过等比数列求和得到最终公式)
Q: How to compute #walk between two nodes?
Use powers of the graph adjacency matrix
用邻接矩阵的幂
计算节点 U 和节点 V 之间,长度为 K 的路径个数
邻接矩阵k 次幂
拓展阅读:
NetworkX相关文档
- https://networkx.org/documentation/stable/reference/generated/network.classes.function.common_neighbors.html
- https://networkx.org/documentation/stable/reference/algorithms/generated/nwtworkx.algorithms.link_prediction.jaccard_coefficient.html
- https://networkx.org/documentation/stable/reference/algorithms/generated/networkxx.algorithms.lin_prediction.adamic_adar_index.html
- https://stackoverflow.com/questions/62069781/how-to-find-the-similarity-between-pair-of-vertices-using-katz-index-in-python