【Paper】MVSTT: A Multiview Spatial-Temporal Transformer Network for Traffic-Flow Forecasting

Abstract

Many current methods focus on single view or dual-view learning, which limits the learning of complex spatiotemporal features. In this work, a new multview transformer network (MVSTT) is proposed, which can effectively learn complex spatiotemporal features from the perspective of multview. Domain correlations and underlying patterns.

  • What is sing view dual-view?
  • What is multview?
    The main tasks include:
  • Examining the timing perspective, we designed a short-range gated convolution component and a long-range gated convolution component.
  • From a spatial perspective, a dual-graph spatial learning module is designed to extract fixed and dynamic node spatial dependencies.
  • A spatial-temporal transformer is designed to mine different levels of spatiotemporal information through mutlview knowledge fusion.

Code: https://github.com/JianSoL/MVSTT

Introduction

Traffic flow prediction is very important for intelligent transportation systems. In the case of real-time dynamic changes, capturing complex spatiotemporal correlations is difficult. These correlations are caused by internal factors (people's travel habits) and external factors (weather conditions). Therefore, accurately predicting traffic flow is often very challenging. Many current methods focus on single view or dual-view learning, which limits the learning of complex spatiotemporal features, such as: For example, the DCRNN [8] model focuses on extracting long-distance correlations from the temporal view and ignores the long-distance correlations from other views. Important information such as space-time fusion and short-range views. As reported in [9], learning from multiple perspectives can capture knowledge from different domains, often resulting in stronger performance.
As shown in Figure 1, there is rich space-time related knowledge in real-world transportation networks, including temporal patterns, spatial dependencies, and space-time correlations. As reported in [10] and [11], capturing these three types of correlations can achieve satisfactory traffic flow prediction performance. To better extract information from these three areas, here is an analysis of some observations:

  • First, short-range [12] and long-range [13] temporal patterns reflect different influences in temporal local regions [14]. For example, some park areas induce short-term congestion during holidays, while some city areas induce congestion during daily peak times, creating long-range patterns. Therefore, modeling in terms of both short- and long-range information captures more robust temporal patterns.
  • In addition, the spatial structure of actual transportation networks is very different from the predefined graphs in previous studies, including more uncertain events and dynamic evolution behaviors [15]. For example, predefined graphs are usually built based on road topology and obtain a fixed adjacency matrix (AM), while the connections between nodes in actual transportation networks may change dynamically. Static and dynamic graphs can effectively simulate different spatial structures. Furthermore, time and space dependencies do not exist independently, but are interrelated and intricate [11]. Good fusion of spatial-temporal information can reveal complex correlations and hidden dependencies. Therefore, it is crucial to dig deep into traffic characteristics across multiple perspectives and explore potential correlations between these perspectives to provide accurate and robust predictions.

Insert image description here
Fig1. Example of multi-view space-time correlation in transportation network. Each dashed line represents a correlation. In the spatial view, there are rich spatial domain dependencies between different nodes and edges. In the time view, the traffic status of each node vi at the previous time t has different effects on itself and the adjacent nodes at the subsequent time t + n. In addition, the traffic flow of each node vi is not only affected by other nodes at the same time, but also by nodes with different weights on adjacent time steps. Therefore, the traffic road network contains rich space-time knowledge. The spatial structure of traffic road networks (for example, the distance between nodes v1 and v3) is non-Euclidean in the transportation domain.

To address the above challenges and limitations, we consider the spatiotemporal correlation in three perspectives, namely, the temporal perspective, the spatial perspective, and the spatiotemporal fusion perspective, and propose a novel multi-view spatiotemporal transformer (MVSTT) network for traffic prediction . MVSTT learns spatiotemporal features from these three perspectives, and combines modules and transformers based on graph neural networks (GNN) for spatiotemporal information fusion. Furthermore, we divide the time perspective into two sub-perspectives and the space perspective into two sub-perspectives. The main contributions of this study are summarized as follows:
1): We propose a Dual Graph Space Module (DGSM) in spatial view, which can simultaneously capture static and Dynamic spatial dependencies. Static and dynamic graphs can capture fixed topological dependencies and dynamically changing spatial relationships at every step of processing spatio-temporal data.
2): We also include a temporal view and design short-range gated convolution (SGC) components and long-range gated convolution (LGC) components. The former learns short-term traffic patterns at different granularities, while the latter extracts long-term temporal dependencies across multiple time steps.
3): In order to effectively achieve space-time feature fusion, we further propose a space-time transformer (STT) module, in which spatial representation and temporal representation are deeply fused through multiple self-attention mechanisms.
4): We conduct extensive experiments on four real-world traffic datasets to evaluate the proposed model. Both our theoretical analysis and experimental results show that our method outperforms the current state-of-the-art methods.
The remainder of this article is organized as follows. Section 2 reviews related advanced techniques, such as graph neural networks (GNN), attention mechanisms, and traffic flow prediction. Section 3 describes our methodology in detail. Section 4 presents the experimental setup, experimental results, and discussion. In Section V, we conclude the paper and discuss future plans.

Related work

In this section, we review traditional traffic flow prediction methods, as well as the popular graph neural network (GNN) technology. We also provide a brief overview of the attention mechanism separately, as it is an important foundation for transformer technology.

Spatiotemporal data prediction

Space-time data prediction has roughly gone through five stages. Some previous studies treat the space-time prediction task as a time series problem, such as ARIMA [16] and its variants [17], [18]. As reported in [18], Wang et al. combined the seasonal ARIMA model and the Holt-Winters method to achieve the advantages of short-term vehicle flow forecasting based on time-correlated series. Guo et al. [19] proposed an adaptive Kalman filter to implement a stochastic seasonal ARIMA plus generalized autoregressive conditional heteroskedasticity (SARIMA + GARCH) process, which performs real-time traffic prediction at 15-minute intervals. ARIMA-based methods are effective in capturing the variability of traffic flow time series, but are not robust enough in extracting the dynamic characteristics of traffic networks, such as complex nonlinearities and uncertainties. Since there is a certain uncertainty in traffic flow prediction, some studies are inspired to capture this nonlinear uncertainty to obtain better prediction performance. Sun et al. [20] designed a model to capture the nonlinear correlation between adjacent roads using a Bayesian network. As reported in [21] and [22], support vector regression has been successfully used to predict traffic conditions, such as hourly flows and travel times, and to predict short-term highway traffic flows under typical and atypical conditions. Therefore, the development of space-time prediction using deep neural networks has attracted widespread attention. proposed a deep belief network [23] and stacked autoencoder model [24] to improve the ability to capture nonlinear features in traffic flow prediction.

Recently, Zhang et al. [25] pioneered a method called DeepST to predict traffic flow data. They divided the city into geographical grids and by counting traffic flows over a fixed period of time, a traffic flow matrix could be generated. Specifically, the DeepST method consists of a space-time component and a global component to extract space-time information and global factors between different grids (such as weekdays and weekends). Based on previous research [25], Zhang et al. [26] proposed another classic ST-ResNet based on the characteristics of space-time data and residual learning (such as proximity and trend). One highlight is taking external environmental factors, such as weather conditions, into account to make forecasts more reasonable. Zonoozi et al. [27] designed a convolutional recurrent network that focuses on explicitly capturing periodic repeating patterns and multi-step predictions. Considering the large scale of the prediction model and the long inference time, Pu et al. developed a lightweight encoding and decoding framework for traffic flow prediction, which improved the prediction speed while ensuring accuracy [28]. However, these studies [25], [26], [27], [28], [29] do not consider the space-time dependencies of static and dynamic graphs, nor the deep fusion of space-time information. Although some results have been achieved, CNN-based methods are only suitable for Euclidean space and are not suitable for the non-Euclidean space of traffic network.

Recently, graph-based space-time prediction methods have become a hot topic [1], [14], [30], [31]. Yu et al. [30] first proposed a space-time graph convolutional network (GCN) called STGCN, which replaces conventional convolution and recurrent units and builds faster space-time predictions on graph sequences. Considering the dynamic characteristics of traffic flow and the long-range dependencies that CNN or RNN cannot capture the time trend, Wu et al. [15] designed a method called Graph WaveNet, which utilizes a novel adaptive AM and passes nodes Embeddings learn it to capture hidden spatial dependencies. However, if the expansion rate increases, this method will lose short-range information and does not consider deep space-time information fusion. Considering that most previous methods for extracting spatial and temporal dependencies are independent modules and lack the integration of space-time information. To solve this problem, Li and Zhu [10] designed a space-time fusion GNN to fuse space and time graphs in different time periods in a parallel manner to better learn space-time dependencies in complex traffic situations. In addition, this method utilizes the Huber loss function [32] to alleviate the missing value problem of traffic flow data.

Nowadays, Transformer networks have achieved great success in the fields of natural language processing and computer vision [33], [34], and some scholars have begun to explore the use of Transformer networks for traffic flow prediction. Giuliari et al. [35] proposed a new Transformer network for trajectory prediction, considering the original Transformer and a larger bidirectional Transformer. In order to improve the ability to learn highly nonlinear and dynamic space-time dependencies of traffic flow, Xu et al. [36] designed a novel STT network paradigm, including a temporal Transformer component and a spatial Transformer component, which can perceive dynamic directional spatial dependencies. relationships and long-range time dependencies. Wang et al. [31] proposed a space-time GNN method that highlights a learnable position attention mechanism to efficiently aggregate information of adjacent roads. Inspired by Transformer technology and learning different perspectives from traffic flow data, the MVSTT model not only considers short-range and long-range features in the temporal perspective, static topology and dynamic graph dependencies in the spatial perspective, but also uses Transformer to achieve space-time Deep fusion features of perspective.

Convolutions on Graphs

Convolutional neural networks (CNN) and their variants have achieved impressive performance in various applications. Various models based on graph neural networks (GNN) have made contributions from two perspectives: spectral domain graph convolution and spatial domain graph convolution. The main spectral domain graph convolution methods include SCNN [37], ChebNet [38] and GCN [39]. SCNN is a direct application of spectral graph convolution theory. ChebNet uses Chebyshev polynomials to reduce computational complexity based on SCNN, while GCN further simplifies ChebNet and is suitable for corresponding tasks. The main spatial domain graph convolution methods include GraphSAGE [40] and GAT [41]. GraphSAGE considers a combination of convolutional upsampling and information aggregation. It samples neighbors at each stage in the graph and aggregates information about itself and its neighbors. GAT introduces an attention mechanism into the convolution operation and uses this mechanism to dynamically adjust the importance of adjacent nodes.

Attention Mechanism

Transformer is a model that was originally used in the field of natural language processing and is now widely used in various applications. It is a deep neural network based on the attention mechanism [42]. There are several classical methods that utilize attention mechanisms to better predict traffic flow. Liang et al. [43] proposed a multi-level attention layer to extract dynamic spatio-temporal dependencies, combined with a fusion module to capture external factors (such as weather). Zhang et al. [44] designed a graph-based attention method to improve the ability to capture spatial correlations to better predict traffic flow based on multiple sensor data. In another popular research, Guo et al. [14] designed a novel attention mechanism and combined it with GCN to solve the traffic flow prediction task, focusing on space-time attention and capturing dynamic space-time correlation.

In this work, we build an STT to extract spatio-temporal features and deeply fuse the spatio-temporal features of different views using a multi-head self-attention mechanism [45]. The core idea of ​​the self-attention strategy is the interrelationship between sequences, which can self-determine the weight of the assigned input items.

Guess you like

Origin blog.csdn.net/qq_30340349/article/details/131466963