【Paper Introduction】- Variational Graph Recurrent Neural Networks(VGRNN)

Article information

Variational Graph Recurrent Neural Networks(VGRNN)
insert image description here

Original address: Variational Graph Recurrent Neural Networks (VGRNN): https://arxiv.org/abs/1908.09710
Source code: https://github.com/VGraphRNN/VGRNN

Summary

Representation learning over graph structured data has been mostly studied in static graph settings while efforts for modeling dynamic graphs are still scant. In this paper, we develop a novel hierarchical variational model that introduces additional latent random variables to jointly model the hidden states of a graph recurrent neural network (GRNN) to capture both topology and node attribute changes in dynamic graphs. We argue that the use of high-level latent random variables in this variational GRNN (VGRNN) can better capture potential variability observed in dynamic graphs as well as the uncertainty of node latent representation. With semi-implicit variational inference developed for this new VGRNN architecture (SI-VGRNN), we show that flexible non-Gaussian latent representations can further help dynamic graph analytic tasks. Our experiments with multiple real-world dynamic graph datasets demonstrate that SI-VGRNN and VGRNN consistently outperform the existing baseline and state-of-the-art methods by a significant margin in dynamic link prediction.

Representation learning for graph-structured data is mainly studied in the static graph setting, while efforts to model dynamic graphs are still scarce. In this paper, we develop a novel hierarchical variational model that introduces additional random variables to jointly model the hidden states of graph recurrent neural networks (GRNNs) to capture the non-dynamics of topology and node attribute changes picture. We argue that the use of high-level latent random variables in this variational GRNN (VGRNN) can better capture the observed latent variability in dynamic graphs as well as the uncertainty in the latent representation of nodes. Through the semi-implicit variational inference developed for this new VGRNN architecture (SI-VGRNN), we show that flexible non-Gaussian latent representations can further assist dynamic graph analysis tasks. Our experiments using multiple real-world dynamic graph datasets show that SI-VGRNN and VGRNN consistently outperform existing baselines and state-of-the-art methods in dynamic link prediction.

Background

Graph convolutional recurrent networks (GCRN)

GCRN is used to model time series data defined on static graph nodes. A sequence of frames in a video and spatiotemporal measurements on a sensor network are two examples of such datasets. GCRN combines graph convolutional networks (GCNs) with recurrent neural networks (RNNs) to capture spatial and temporal features in data. More precisely, given a graph G with N nodes, whose topology is determined by an adjacency matrix A, and a sequence of node attributes insert image description here, GCRN reads M-dimensional node attributes insert image description hereand updates its hidden state ht for each time step t:
insert image description here
here f is a non-probabilistic deep neural network. It can be any recurrent network, including gated activation functions, such as long short-term memory (LSTM) or gated recurrent unit (GRU), where the deep layers inside them are replaced by graph convolutional layers. GCRN models node attribute sequences by parameterizing the factorization of the joint probability distribution as a product of conditional probabilities.
insert image description hereDue to the determinism of the transfer function f, the choice of the mapping function g here effectively defines that the joint probability distribution sp(X(1), X(2), . . , X(T )|A)can be represented by a standard GCRN. For highly variable sequences this can be problematic. More specifically, when the variability in X is high, the model tries to map this variability in the hidden state h, leading to potentially high variability in h, which in turn leads to overfitting to the training data. Therefore, GCRN is not fully capable of modeling sequences with high variation. This fundamental problem of autoregressive models has been addressed for non-graph-structured datasets by introducing random hidden states to the model.

In this paper, we integrate GCN and RNN into a graph RNN (GRNN) framework, which is a dynamic graph autoencoder model. While GCRN aims to model dynamic node attributes defined on a static graph, GRNN can obtain different adjacency matrices at different time snapshots and reconstruct the graph at time t by employing an inner product decoder on the hidden state ht. More specifically, ht can be viewed as the node embedding of the dynamic graph at time t. To further improve the expressive power of GRNN, we introduce random latent variables by combining GRNN and Variational Graph Autoencoder (VGAE). In this way, not only can we capture temporal dependencies between graphs without assuming smoothness, but each node is represented by a distribution in the latent space. Furthermore, the prior construction designed in VGRNN enables it to predict links in future time steps.

Semi-implicit variational inference (SIVI)

SIVI has been shown to effectively learn posterior distributions with skewness, kurtosis, multimodality, and other features that cannot be captured by existing variational inference methods. To characterize the underlying posterior q(z|x), SIVI introduces a mixture distribution over the parameters of the original posterior distribution to extend the variational family with a hierarchical structure: z ∼ q(z|ψ)与ψ ∼ qφ(ψ). φ denotes the parameter of the distribution to infer. While the raw posterior q(z|ψ) is required to have an analytical form, its mixture distribution is not subject to this constraint, so the marginal posterior distribution is often implicit and less expressive than the analytical density function. It is also common that the edges of the hierarchy are implicit, even though both the posterior and mixture distributions are explicit. We integrate SIVI into our new model to infer more flexible and interpretable node embeddings for dynamic graphs.

Variational graph recurrent neural network (VGRNN)

We consider a dynamic graph G = {G(1), G(2), . . . , G(T )}, where G(t)= (V(t), E(t))is the graph at time step t and V(t) and E(t)is the corresponding set of nodes and edges, respectively. In this paper, we aim to develop a model that is generally compatible with potential changes in node and edge concentration. (V(t),E(t))There are no constraints on the relationships between and (V(t 1),E(t 1)), i.e. new nodes can join the dynamic graph and create edges to existing nodes, or previous nodes can disappear from the graph.

On the other hand, new edges can form between snapshots and existing edges can disappear. Let Nt denote the number of nodes, ie the cardinality V(t)at time step t. Therefore, VGRNN can take a sequence of variable-length adjacency matrices A = {A(1), A(2), . . . , A(T )}as input. Furthermore, when node attributes are considered, different attributes can be observed on different snapshots, with variable-length node attribute sequences X = {X(1), X(2), . . . , X(T )}. Note that A(t) and X(t)are Nt × Nt and Nt × M matrices, respectively, where M is the dimension of node attributes, which is constant over time. Inspired by variational recurrent neural networks (VRNNs), we build VGRNNs by integrating GRNNs and VGAEs to fully and simultaneously model complex dependencies between topology and node property dynamics. Furthermore, each node is represented with a distribution at each time, so the uncertainty of the node's potential representation is also modeled in VGRNN.

VGRNN model

insert image description here

The VGRNN model uses VGAE to model each graph snapshot. VGAEs are conditioned on the state variable ht−1 over time, modeled by a GRNN. Such an architectural design will help each VGAE consider the temporal structure of dynamic graphs. More importantly, unlike standard VGAE, VGAE in VGRNN adopts new priors on latent random variables by allowing distribution parameters to be modeled by explicit or implicit complex functions of information from previous time steps. More specifically, instead of imposing a standard multivariate Gaussian distribution with deterministic parameters, VGAE in VGRNN learns the prior distribution parameters from the hidden states in previous time steps. Therefore, our VGRNN allows a more flexible latent representation with greater expressive power that can capture dependencies between and within topological and node attribute evolution processes. In particular, we can write the construction of the prior distribution used in the experiment as follows:
insert image description herewhere, insert image description hereand insert image description here
denote the parameters of the conditional prior distribution.

Additionally, the generating distribution will insert image description herebe conditioned on:
insert image description here

where π(t) denotes the parameters of the generating distribution; φ prior and φ dec can be any highly flexible functions, such as neural networks.

On the other hand, the backbone GRNN can flexibly model complex dependencies, including graph topology dynamics and node attribute dynamics. GRNN updates its hidden state using a recursive equation:
insert image description here
where f is initially the transfer function of equation (1). Unlike GRNN, the graph topology can change in different time steps, just like in real-world dynamic graphs, and the adjacency matrix A(t) is time-dependent in VGRNN. To further enhance the expressive power, φx and φz are deep neural networks that run independently on each node and extract features from X (t) and Z (t) , respectively. These feature extractors are crucial for learning complex graph dynamics. Based on (4), ht is a function A≤(t), X≤(t) and Z≤(t)of . Thus, the prior and generative distributions in equations (2) and (3) define the distributions insert image description hereand , respectively insert image description here. The generative model can be decomposed into
insert image description hereone where the prior for the first snapshot is considered to be a standard multivariate Gaussian distribution, ie insert image description here. Furthermore, if a previously unobserved node is added to the graph at snapshot t, we consider the node's hidden state at snapshot t − 1 to be zero, so the node's prior state at time t is N( 0, I). If node deletion occurs, we assume that the identity of the node can be maintained, so deleting a node is equivalent to deleting all the edges connected to it, and will not affect the previous construction of the next step. More specifically, the sizes of A and X can vary over time, while their latent spaces are maintained over time.

Semi-implicit VGRNN (SI-VGRNN)

To further improve the expressive power of the variational posterior of VGRNN, we introduce the aSI-VGRNN dynamic node embedding model. We impose a mixture distribution on the variational distribution parameter in (8) to model the posterior of a VGRNN using a semi-implicit hierarchical structure: while the variational distribution q needs to be explicit, the mixture
insert image description here distribution q φ is not subject to such constraints, resulting in quite flexible . More specifically, SI-VGRNN transforms random noise t to draw samples from qφ through a graph neural network, which often results in an implicit distribution of qφ.q(Z(t)| ψt)Eψt∼qφ(ψt|A(t),X(t),ht−1)(q(zt|ψt))

Guess you like

Origin blog.csdn.net/weixin_43598687/article/details/131195342