Common Metric Properties of Networks

1 degree distribution

The degree distribution p(k)p(k) of the network represents the probability that a randomly selected node has degree kk. We set the number of nodes with degree kk Nk=♯ nodes with degree kNk=♯ nodes with degree k, and divide by the number of nodes NN to get the normalized probability mass distribution:

P(k)=Nk/N(k∈N)P(k)=Nk/N(k∈N)

We have: ∑k∈NP(k)=1∑k∈NP(k)=1.
For this network:

The normalized degree distribution histogram can be expressed as follows:

2 paths

2.1 The path of the graph

The path (path) of a graph refers to a node sequence, so that each node in the sequence is linked to the next node in the sequence (note: the terminology here is different in different textbooks, and some textbooks define the path here as a walk ), and reserve the term "path" for simple paths). Paths can be represented in the following ways:

Pn={i0,i1,i2,…,in}Pn={(i0,i1),(i1,i2),(i2,i3),…,(in−1,in)}Pn={i0,i1,i2,…,in}Pn={(i0,i1),(i1,i2),(i2,i3),…,(in−1,in)}

A path can intersect itself by passing the same edge multiple times. As shown in the figure below, more paths ABDCDEG intersect with itself.

Note that paths in directed graphs can only follow the direction of edges.

2.2 Number of paths

The number of paths is defined as the number of paths between nodes uu and vv. We found that there is a relationship between the power of the adjacency matrix and the number of paths.

  • Path count matrix of length h=1h=1 (here h can be understood as hops): only need to check whether there is a link of length 11 between uu and vv, that is

    H(1)uv=UvHuv(1)=Uv

  • Path counting matrix with length h=2h=2: It is necessary to check whether there is a path of length 22 between uu and vv, that is, to count kk satisfying AukAkv=1AukAkv=1.

    H(2)uv=∑k=1NAukAkv=[A2]uvHuv(2)=∑k=1NAukAkv=[A2]uv

  • Path count matrix of length hh: It is necessary to examine whether there is a path of length hh between uu and vv, that is, for all <k1,k2,⋯,kh satisfying Auk1Ak1k2….Akh−1v=1Auk1Ak1k2….Akh−1v=1 −1><k1,k2,⋯,kh−1>sequence to count.

    H(h)uv=[Ah]uvHuv(h)=[Ah]uv

The above conclusion holds for both directed and undirected graphs. The above theorem explains that if there is a shortest path between uu and vv, then its length is the smallest kk that makes AkuvAuvk non-zero.
A further inference shows that a simple way to find all the shortest paths in a graph of n nodes is to perform successive power calculations on the adjacency matrix AA of the graph one by one, until the n−1n−1th time, observe that each The power at which an element first becomes positive. This idea has an important application in the Folyd-Warshall shortest path algorithm.

2.3 Distance

The distance (distance) between two nodes in the graph is defined as the number of edges in the shortest path between the two points (if the two points are not connected, the distance is usually defined as infinity).
For the following figure, we have the distance HB between BB and DD, D=2HB, D=2, the distance hA between AA and XX, X=∞hA, X=∞.

Note that distances in directed graphs must follow the direction of the edges. This leads to a non-symmetrical distance in the directed graph. For example, in the following figure we have hA,C≠hC,AhA,C≠hC,A.

We define the maximum distance between any two nodes as the diameter of the graph.

2.4 Average path length

The average path length of an undirected connected graph (connected components) or directed strongly connected graph (strongly connected components) is defined as:

h¯=12Emax∑i,j≠ihijh¯=12Emax∑i,j≠ihij

Here hijhij is the distance from node ii to jj. Emax=n(n−1)2Emax=n(n−1)2, here the coefficient 22 in 2Emax2Emax is optional, and the definition method of different teaching materials is different.
When calculating the average path length, we usually only calculate the distance between connected nodes (i.e. ignore paths with length "infinity")

2.5 Finding the shortest path

For unweighted graphs, we can search for the shortest path in the graph by breadth-first search (BFS).

  • Start at node uu, label it as hu(u)=0hu(u)=0, and enqueue it.
  • When the queue is not empty:
    • Remove the first element vv from the queue, add its unmarked neighbors to the queue and mark it as hu(w)=hu(v)+1hu(w)=hu(v)+1.
    • Repeatedly.

For weighted graphs, of course we have to seek algorithms such as Dijkstra and Bellman-Ford, which will not be repeated here.

3 Clustering coefficient

The clustering coefficient of node ii can be intuitively understood as the percentage of neighbors of node ii that are connected to each other. Let the degree of node ii be kiki, then its clustering coefficient CiCi is defined as

Ci=2eiki(ki−1)Ci=2eiki(ki−1)

Here eiei is the number of edges between neighbors of node ii, we have Ci∈[0,1]Ci∈[0,1]. Some examples of clustering coefficients are shown below:

The average clustering coefficient of a graph is defined as:

C=1N∑iNCiC=1N∑iNCi

4 Properties of real-world networks

Next, let's look at an example of MSN sending and receiving information network (directed).

245 million users registered in the network, 180 million users participated in the chat, and there were more than 30 billion replies. More than 255 billion interactive messages.
connectivity

degree distribution

Its degree distribution is highly skewed, with an average degree of 14.414.4.

log-log degree distribution

clustering coefficient

Here, for the convenience of graphing, we define the abscissa as the degree kk, and the corresponding ordinate CkCk as the average value of the clustering coefficient CiCi of the nodes with the degree kk, that is, Ck=1Nk∑i:ki=kCiCk=1Nk∑i:ki = kCi.

The average clustering coefficient of the entire network is 0.110.11.

distance distribution

Among them, the average path length is 6.66.6, and 90%90% of the nodes can be reached within 88 hops.

reference

[1] CS224W | Home
[2] Easley D, Kleinberg J. Networks, crowds, and markets: Reasoning about a highly connected world[M]. Cambridge university press, 2010.
[3] Barabási A L. Network science[J]. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2013, 371(1987): 20120375.
[4] 《图论概念梳理》

Guess you like

Origin blog.csdn.net/weixin_47367099/article/details/127655918