基于Neo4j图数据的三角计数社区发现算法

形成一个社区在所有类型的网络中都很常见,识别它们对于评估群体行为和突发现象都很重要。通常来说,社区的成员在群体内的关系比在群体外的节点多,这是社区检测的一般原则。识别这些相关集体可以揭示节点群集、独立组和网络结构。此信息有助于推断对等的各组的相似行为和偏好、弹性估算和查找嵌套关系,也可以为其他分析准备数据。社区检测算法也常用于生成用来做一般性检测的网络可视化图。

当使用社区检测算法时,要注意关系的密度。如果图非常密集,那么最终可能会导致所有节点聚集在一个或几个集群中。你可以通过度、关系权重或相似性度量的过滤来缓解这一点。
另一方面,如果图太稀疏,连接的节点很少,那么您可能会得到每个聚类只有一个节点。在这种情况下,尝试合并进来更多相关信息和其他关系类型。

一、导入节点

WITH "https://github.com/neo4j-graph-analytics/book/raw/master/data/" AS base
WITH base + "sw-nodes.csv" AS uri
LOAD CSV WITH HEADERS FROM uri AS row
MERGE (:Library {id: row.id})

二、导入关系

WITH "https://github.com/neo4j-graph-analytics/book/raw/master/data/" AS base
WITH base + "sw-relationships.csv" AS uri
LOAD CSV WITH HEADERS FROM uri AS row
MATCH (source:Library {id: row.src})
MATCH (destination:Library {id: row.dst})
MERGE (source)-[:DEPENDS_ON]->(destination)

三、三角形的具体结果查询分析

CALL algo.triangle.stream("Library","DEPENDS_ON",{concurrency:4})
YIELD nodeA, nodeB, nodeC
RETURN algo.getNodeById(nodeA).id AS nodeA,
algo.getNodeById(nodeB).id AS nodeB,
algo.getNodeById(nodeC).id AS nodeC;

CALL algo.triangle.stream("新浪微博ID","关注",{concurrency:4})
YIELD nodeA, nodeB, nodeC
RETURN algo.getNodeById(nodeA).nameNodeSpace AS nodeA,
algo.getNodeById(nodeB).nameNodeSpace AS nodeB,
algo.getNodeById(nodeC).nameNodeSpace AS nodeC;

四、三角形计数与局部聚类系数(全局聚类系数)

CALL algo.triangleCount.stream('Library', 'DEPENDS_ON',{concurrency:8})
YIELD nodeId, triangles, coefficient
WHERE coefficient > 0
RETURN algo.getNodeById(nodeId).id AS library, coefficient
ORDER BY coefficient DESC;

CALL algo.triangleCount.stream('新浪微博ID', '关注',{concurrency:8})
YIELD nodeId, triangles, coefficient
WHERE coefficient > 0
RETURN algo.getNodeById(nodeId).nameNodeSpace AS library, coefficient
ORDER BY coefficient DESC;
CALL algo.triangleCount.forkJoin.stream('Library', 'DEPENDS_ON',{concurrency:8})
YIELD nodeId, triangles, coefficient
WHERE coefficient > 0
RETURN algo.getNodeById(nodeId).id AS library, coefficient
ORDER BY coefficient DESC
CALL algo.triangleCount('Library', 'DEPENDS_ON',{concurrency:4, write:true, writeProperty:'triangles', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount,averageClusteringCoefficient RETURN loadMillis, computeMillis, writeMillis, nodeCount, triangleCount,averageClusteringCoefficient
CALL algo.triangleCount.forkJoin('Library', 'DEPENDS_ON',{concurrency:4, write:true, writeProperty:'triangles', clusteringCoefficientProperty:'coefficient'})
YIELD loadMillis, computeMillis, writeMillis, nodeCount, triangleCount,averageClusteringCoefficient RETURN loadMillis, computeMillis, writeMillis, nodeCount, triangleCount,averageClusteringCoefficient

在一个社交网络中,一个人局部聚类系数高,说明此人凝聚力比较高

更多资料

发布了173 篇原创文章 · 获赞 113 · 访问量 30万+

猜你喜欢

转载自blog.csdn.net/superman_xxx/article/details/104905115