A complete guide to graph data processing in Python from raw edge lists to adjacency matrices

The open source China community team made its first live broadcast, telling the story behind the open source China community in the name of sharing."

This article is shared from the Huawei Cloud Community " A Complete Guide to Implementing Graph Data Processing in Python from Original Edge List to Adjacency Matrix " by Lemony Hug.

In graph theory and network analysis, a graph is a very important data structure that consists of nodes (or vertices) and the edges connecting these nodes. In Python, we can represent a graph using an adjacency matrix, where the rows and columns of the matrix represent nodes, and the values in the matrix represent whether there are edges between nodes.

Original edge list

Suppose we have a primitive edge list where each element represents an edge, for example:

edges = [(0, 1), (0, 2), (1, 2), (2, 3)]

In this example, each tuple represents the existence of an edge between the node and the node . (a, b) a b

Convert to adjacency matrix

We first need to determine the number of nodes in the graph and then create a zero matrix of corresponding size. Next, we traverse the original edge list and set the corresponding matrix element to 1 based on the two nodes of each edge. The final matrix obtained is the adjacency matrix we need.

Let’s see how to implement this process in Python code:

def edges_to_adjacency_matrix(edges):
    # Find the number of nodes in the graph
    max_node = max(max(edge) for edge in edges) + 1
    
    #Create zero matrix
    adjacency_matrix = [[0] * max_node for _ in range(max_node)]
    
    # Traverse the original edge list and update the adjacency matrix
    for edge in edges:
        adjacency_matrix[edge[0]][edge[1]] = 1
        adjacency_matrix[edge[1]][edge[0]] = 1 # If it is an undirected graph, the edges are bidirectional
    
    return adjacency_matrix

# test
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
adjacency_matrix = edges_to_adjacency_matrix(edges)
for row in adjacency_matrix:
    print(row)

In this code, edges_to_adjacency_matrix the function accepts the original edge list as argument and returns the corresponding adjacency matrix. We then performed the test on the given edge list and output the resulting adjacency matrix.

Expand and optimize

Although the above code can complete the conversion of the original edge list to the adjacency matrix, some extensions and optimizations may be required in practical applications.

Processing directed and undirected graphs : The current code handles undirected graphs by default. If it is a directed graph, the code needs to be modified according to specific needs and the adjacency relationship is only set in one direction.
Handling weights : Sometimes an edge is not just a relationship between existence or absence, but may also have a weight. Code modified to support weighted graphs.
Use sparse matrices : For large graphs, adjacency matrices may take up a lot of memory. Consider using sparse matrices to save memory space.
Performance optimization : For large-scale edge lists, the performance of the code needs to be considered. You can try to use more efficient data structures or algorithms to implement the conversion process.

Here are some examples of optimizations to your code:

import numpy as np

def edges_to_adjacency_matrix(edges, directed=False):
    max_node = max(max(edge) for edge in edges) + 1
    adjacency_matrix = np.zeros((max_node, max_node))
    for edge in edges:
        if directed:
            adjacency_matrix[edge[0]][edge[1]] = 1
        else:
            adjacency_matrix[edge[0]][edge[1]] = 1
            adjacency_matrix[edge[1]][edge[0]] = 1
    return adjacency_matrix

# test
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
adjacency_matrix = edges_to_adjacency_matrix(edges)
print("Adjacency matrix of undirected graph:")
print(adjacency_matrix)

directed_edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
directed_adjacency_matrix = edges_to_adjacency_matrix(directed_edges, directed=True)
print("\nAdjacency matrix of directed graph:")
print(directed_adjacency_matrix)

In the optimized code, we use the NumPy library to create and manipulate matrices, which can improve the performance and readability of the code. At the same time, we added a parameter to indicate the type of graph to support the conversion of directed and undirected graphs. directed

Optimize memory footprint using sparse matrices

When dealing with large graphs, the adjacency matrix can become very sparse, with most of its elements being zeros. To optimize memory usage, a sparse matrix can be used to represent adjacency relationships.

There are a variety of libraries in Python that can handle sparse matrices, among which the Scipy library provides various operations and algorithms for sparse matrices. Let’s take a look at how to optimize your code using sparse matrices in Scipy:

import numpy as np
from scipy.sparse import lil_matrix

def edges_to_adjacency_matrix(edges, directed=False):
    max_node = max(max(edge) for edge in edges) + 1
    adjacency_matrix = lil_matrix((max_node, max_node), dtype=np.int8)
    for edge in edges:
        if directed:
            adjacency_matrix[edge[0], edge[1]] = 1
        else:
            adjacency_matrix[edge[0], edge[1]] = 1
            adjacency_matrix[edge[1], edge[0]] = 1
    return adjacency_matrix

# test
edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
adjacency_matrix = edges_to_adjacency_matrix(edges)
print("Adjacency matrix of undirected graph:")
print(adjacency_matrix.toarray())

directed_edges = [(0, 1), (0, 2), (1, 2), (2, 3)]
directed_adjacency_matrix = edges_to_adjacency_matrix(directed_edges, directed=True)
print("\nAdjacency matrix of directed graph:")
print(directed_adjacency_matrix.toarray())

In this version of the code, we use to create a sparse matrix. It can handle large sparse matrices efficiently and only stores non-zero elements, thus saving memory. scipy.sparse.lil_matrix

Through this optimization, we can process larger graph data without causing performance degradation or out-of-memory problems due to excessive memory usage.

Process weighted edge lists

In some cases, the edges of the graph not only represent the connection relationships between nodes, but may also have weight information. For example, in a transportation network, edges can represent roads, and weights can represent the length or travel time of a road.

Let's see how we can modify the code to support weighted edge lists:

import numpy as np
from scipy.sparse import lil_matrix

def edges_to_adjacency_matrix(edges, directed=False, weighted=False):
    max_node = max(max(edge[0], edge[1]) for edge in edges) + 1
    adjacency_matrix = lil_matrix((max_node, max_node), dtype=np.float32)
    for edge in edges:
        if directed:
            if weighted:
                adjacency_matrix[edge[0], edge[1]] = edge[2]
            else:
                adjacency_matrix[edge[0], edge[1]] = 1
        else:
            if weighted:
                adjacency_matrix[edge[0], edge[1]] = edge[2]
                adjacency_matrix[edge[1], edge[0]] = edge[2]
            else:
                adjacency_matrix[edge[0], edge[1]] = 1
                adjacency_matrix[edge[1], edge[0]] = 1
    return adjacency_matrix

# test
weighted_edges = [(0, 1, 5), (0, 2, 3), (1, 2, 2), (2, 3, 7)]
weighted_adjacency_matrix = edges_to_adjacency_matrix(weighted_edges, weighted=True)
print("Weighted adjacency matrix:")
print(weighted_adjacency_matrix.toarray())

In this version of the code, we added a parameter to indicate whether the edge is weighted. If the argument is , then the weight information is extracted from the edge list and saved into the adjacency matrix. Otherwise, the values in the adjacency matrix still represent the presence or absence of the edge. weighted weighted True

With this modification, we can process graph data with weight information and retain this information in the adjacency matrix for subsequent analysis and calculations.

Visualization of graphs

When dealing with graph data, visualization is a powerful tool that can help us intuitively understand the structure and characteristics of the graph. There are many libraries in Python that can be used to visualize graph data, among which NetworkX is a commonly used library that provides rich functions to create, manipulate and visualize graphs.

Let's see how to use NetworkX to visualize our generated adjacency matrix:

import networkx as nx
import matplotlib.pyplot as plt

def visualize_adjacency_matrix(adjacency_matrix):
    G = nx.from_numpy_matrix(adjacency_matrix)
    pos = nx.spring_layout(G) # Define node position
    nx.draw(G, pos, with_labels=True, node_color='skyblue', node_size=500, font_size=10)  # 绘制图
    edge_labels = {(i, j): w['weight'] for i, j, w in G.edges(data=True)} # Get edge weights
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=10) # Draw edge weights
    plt.title("Graph Visualization")
    plt.show()

# test
weighted_edges = [(0, 1, 5), (0, 2, 3), (1, 2, 2), (2, 3, 7)]
weighted_adjacency_matrix = edges_to_adjacency_matrix(weighted_edges, weighted=True)
print("Weighted adjacency matrix:")
print(weighted_adjacency_matrix.toarray())

visualize_adjacency_matrix(weighted_adjacency_matrix.toarray())

In this code, we first convert the adjacency matrix into a graph object using NetworkX's functions. Then use the position of the defined node and draw the graph using the function. Finally, we plot the edge weights using a function. from_numpy_matrix spring_layout draw draw_networkx_edge_labels

Through visualization, we can clearly see the structure of the graph and intuitively understand the connection relationships and weight information between nodes.

Convert adjacency matrix to raw edge list

In graph data processing, sometimes we need to convert the adjacency matrix back to its original edge list form. This may be useful in certain algorithms and applications, as some algorithms may be better suited to using edge lists to represent the graph.

Let's see how to write code to achieve this conversion:

import numpy as np

def adjacency_matrix_to_edges(adjacency_matrix):
    edges = []
    for i in range(adjacency_matrix.shape[0]):
        for j in range(adjacency_matrix.shape[1]):
            if adjacency_matrix[i, j] != 0:
                edges.append((i, j, adjacency_matrix[i, j]))
    return edges

# test
adjacency_matrix = np.array([[0, 1, 0, 0],
                              [1, 0, 1, 0],
                              [0, 1, 0, 1],
                              [0, 0, 1, 0]], dtype=np.float32)
print("Original adjacency matrix:")
print(adjacency_matrix)

edges = adjacency_matrix_to_edges(adjacency_matrix)
print("\nConverted edge list:")
print(edges)

In this code, we iterate through each element of the adjacency matrix and if the value of the element is non-zero, we convert it to an edge in the edge list. For graphs with weights, we also save the weight information in the edge list.

Through this conversion process, we can convert the graph represented by the adjacency matrix into the form of an edge list, thereby facilitating the implementation and application of some algorithms.

Summary and Outlook

This article introduces how to use Python to convert the original edge list into an adjacency matrix, and conducts a series of extensions and optimizations to meet the needs of different scenarios. We cover multiple aspects of graph data processing, from processing undirected and directed graphs, weighted edge lists, to using sparse matrices to optimize memory usage, to graph visualization and adjacency matrix conversion to raw edge lists.

In practical applications, graph data processing is a very important and widely used field, involving network analysis, social networks, transportation planning, bioinformatics and many other fields. Mastering the skills of graph data processing can help us better understand and analyze complex data structures to solve practical problems.

In the future, as the scale and complexity of data continue to increase, the field of graph data processing will face more challenges and opportunities. We can expect more efficient, flexible, and feature-rich tools and algorithms to emerge to address changing needs and challenges. At the same time, we can also continue to learn and explore, constantly improve our abilities and levels in the field of graph data processing, and make greater contributions to solving practical problems.

I hope this article will help you understand and apply graph data processing. You are also welcome to further study and explore this field and contribute to the development of data science and engineering.

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~