Pytorch Neural Network Practical Study Notes_43 Figure Neural Network DGL Library: Introduction + Installation + Uninstallation + Dataset + PYG Library + NetWorkx Library

The DGL library is a graph neural network framework jointly launched by New York University and Amazon. It supports the processing of heterogeneous graphs, and open source related heterogeneous graph neural network codes. It has also achieved good results in the implementation of well-known models in the industry such as GCMC and RGCN. Effect.

1 DGL library

1.1 Implementation and performance of DGL library

Implementing GNN is not easy because it requires high GPU throughput on irregular data.

1.1.1 Introduction to DGL library

The logic layer of the DGL library uses vertex fields to make the code easier to understand. At the same time, a lot of work has been done on the underlying memory and operating efficiency, so that the framework can exert better performance.

1.1.2 DGL library features

GCMC: DGL's memory optimization supports training on the MovieLens10M dataset on one GPU (the original implementation required data to be dynamically loaded from the CPU), reducing the training time from 24 hours to more than 1 hour.

RGCN: RGCN is reimplemented using a brand new heterogeneous graph interface. Reduced memory overhead.

HAN: The flexible interface provided can transform a heterogeneous graph into a homogeneous graph through a meta-path.

Metapath2vec: The new metapath sampling implementation is 2x faster than the original implementation.

1.1.3 Molecular Chemistry Model Library DGL-Chem

The molecular library provides pre-trained models including molecular property prediction and molecular structure generation, as well as training knowledge graph embedding special package DGL-KE. Among them, the performance of DGL-KE is even better.

On a single GPU, DGL-KE can train FB15K graph embeddings using the classic TransE model in 7 minutes. While GraphVite (v0.1.0) takes 14 minutes to compute on 4 GPUs.

The first version of DGL-KE released TransE, CompEx and Distmut models, supporting CPU training, GPU training, mixed CPU and GPU training, and single-machine multi-process training.

1.2 Install DGL library

1.2.1 Check the local CUDA version

Enter in CMD

nvcc --version

1.2.2 View version

DGL version query corresponding to CUDA (64-bit) https://conda.anaconda.org/dglteam/linux-64

1.2.3 Installed version

conda install -c dglteam dgl-cuda11.3

1.2.4 Uninstall DGL

If you have installed an unnecessary DGL version, such as 0.7.1 and want to replace it with 0.4.3 version, you need to delete the existing version:

Delete the current DGL by default: conda uninstall -c dglteam dgl-cuda10.2

(cuda10.2 can be adjusted according to your own environment)

Specify the deleted version: conda uninstall -c dglteam dgl-cuda10.2==0.5.0
(cuda10.2==0.5.0 can be adjusted according to your own environment, use conda list to view the current version used)

1.3 Datasets in the DGL library

1.3.1 Sst (Stanford sentiment treebank, Stanford sentiment treebank)

Each sample is a tree-structured sentence, and leaf vertices represent words; each vertex also has sentiment annotations, which are divided into 5 categories (very negative, negative, neutral, positive, very positive)

1.3.2 KarateCub

There is only one graph in the dataset, and the vertices in the graph describe whether a user in a social network is a member of a karate club.

1.3.3 CationGraph

Vertices represent authors, and edges represent citation relationships.

1.3.4 CORA

Vertices represent authors, and edges represent citation relationships.

1.3.5 CORAFUll

An extension of the CORA dataset, where vertices represent papers and edges represent citation relationships between papers.

1.3.6 AmazonCoBuy

Vertices represent commodities, and edges represent two quotients that are often purchased together. Vertex features represent product reviews, and vertex category labels represent product categories.

3.7 Coauthor

Vertices represent authors, and edges represent relationships of co-authored papers. Vertex features represent keywords in the author's paper, and vertex category labels represent the author's research field.

3.8 MiniGCDalaset (ie mini graph classification dataset, mini graph classification dataset)

Contains 8 different types of graphs, including cycle graph, star graph, wheel graph, lollipop graph, hypercube graph, grid graph, group graph and circular ladder graph.

3.9 TUDlevels

Graph Kernel Dataset in Graph Classification.

3.10 GINDataset (ie graphLsomorphism network dataset, graph isomorphism network dataset)

A compact subset of the graph kernel dataset. The datasets contain compact formats of popular graph kernel datasets, including 4 bioinformatics datasets (MUTAG, NCH, PROTEINS, PTC) and 5 social network datasets (COLAB, IMDBBNARY, MDBMULT, REDDITBINARY, REDDITMULT5K).

3.11 PPIDataset (ie protein-protein interaction dataset, protein-protein interaction dataset)

The dataset contains 24 graphs, each with an average number of vertices of 2372, each with 50 features and 121 labels.

3.12 QM7b

Consists of 7211 molecules, all of which can be regressed to 14 classification targets. Vertices represent atoms and edges represent bonds.

1.4 Loading of datasets in DGL library

When in use, it can be instantiated directly through the dataset class in the dg.data library.
The parameters of instantiation are configured according to the definition of the constructor of each dataset class.
code show as below:

# 该代码的作用是创建并加载一个同构图数据集。该代码运行后,会自动从网络上下载指定的数据集
# 并解压缩,然后载入到内存,并返回数据集对象dataset。该数据集类与PyTorch的Dataset类兼容。
dataset=GINDataset('MUTAG',self-loop=True) #数据集为MUTAG,使用自环图

1.4.1 Tips for dataset loading

The dataset classes in the dgldata library are not well planned. Some classes are directly exposed under the data, and some classes are encapsulated with an additional layer.

For example, the CoraDataset class is encapsulated in the citation_graph.py file, and the following code needs to be written when loading:

from dgl.data import citation-graph
data = citation_graph.corapataset()
该代码在执行时会读取指定的数据集,并生成邻接矩阵,然后调用NetWorkx模块根据该邻接矩阵生成图以及训练数据集、测试数据集。

Therefore, when using the DGL dataset, you also need to search separately under the dgl/data path, and the actual code in the library shall prevail.

1.5 Graphs in the DGL library (DGLGraph)

The DGLGraph class encapsulates a unique graph structure, which can be understood as the core of the DGL library. Most of the graph neural networks in the DGL library are implemented based on the DGLGraph class.

1.6 Inline functions in the DGL library

The DGL library provides a large number of inline (buit-in) functions, which are mainly used to perform operations on edges and vertices, and their efficiency is much higher than that of ordinary graph processing functions.

The inline functions in the DGL library are placed under the dgl.function module. When using it, it needs to cooperate with the message propagation mechanism of the DGLGraph graph to perform operations.

The message propagation mechanism belongs to the underlying function of the DGL library and is often used in the construction of graph neural network models.

If you only use the graph neural network model packaged in the DGL library, you don't need to understand it.

2 PYG Libraries

The PyG library is a geometric deep learning extension library built on PyTorch that can leverage specialized CUDA kernels to achieve high performance.

Following Simple Message Passing APl, it bundles most of the recently proposed convolutional and pooling layers into a unified framework that supports both CPU and GPU computing and follows an invariant dataflow paradigm that can be scaled with Dynamically change the graph structure over time.

3 NetWorkx Libraries

NetWorkk is a graph theory and complex network modeling tool developed in Python language. It has built-in common graph and complex network analysis algorithms, and can easily perform tasks such as analyzing complex network data and simulation modeling.

Using NetWorkx, you can store networks in standardized and non-standardized data formats, generate a variety of random networks and classical networks, analyze network structures, build network models, design new network algorithms, and draw networks.

3.1 Installation and use of NetWorkx libraries

Since the NetWorkx library is integrated into the Anaconda software by default, if Anaconda has been installed, the NetWorkx library can be used directly.

3.2 Querying the version of the NetWorkx library

import networkx
print(networkx.__version__)
# 2.7.1

3.3 Graph Structures Supported by NetWorkx Libraries

  1. Graph: Undirected graph without multiple edges.
  2. DiGraph: Directed graph without multiple edges.
  3. MultiGraph: Undirected graph with multiple edges.
  4. MuliDiGraph: A directed graph with multiple edges.

3.4 Graph Data Objects in NetWorkx Libraries

The graph data objects in the NetWorkx library can be converted into strings in graphm/file format through the nx.generate_graphml interface. The string is stored in generator form, and each subgraph is an element in the generator.

import networkx as nx

G = nx.path_graph(4)
print(list(nx.generate_graphml(G)))

After the code is executed, the graph data object in the graphml file format will be output, as follows:

['<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">', '  <graph edgedefault="undirected">', '    <node id="0" />', '    <node id="1" />', '    <node id="2" />', '    <node id="3" />', '    <edge source="0" target="1" />', '    <edge source="1" target="2" />', '    <edge source="2" target="3" />', '  </graph>', '</graphml>']

Through the description of the graphml file format, the textual display of the graph data is realized, and the maintenance of the graph data can be completed by directly modifying the content of the graphml file, which is more direct and flexible than using the interface function.

3.4.1 Persistence of graphml files

Use the nx_writegraphm interface to output graph objects in memory. After editing, use the nx.read_graphml interface to load the file into memory.

3.4.2 How to open the graphml file

The graphml file is in xml format and can be opened with yEd Graph Edtor software

Guess you like

Origin blog.csdn.net/qq_39237205/article/details/123874778