Article information
The title of the paper is "Adaptive graph convolutional network-based short-term passenger flow prediction for metro", an article on rail transit short-term passenger flow prediction based on adaptive graph convolutional network published in the Journal of Intelligent Transportation Systems in May 2023.
Summary
With the development and acceleration of urbanization, the urban rail transit system has gradually grown into a large network, and the topology between stations has become more and more complex, making it increasingly difficult to capture spatial dependence. The crisscrossing of multiple lines creates a topological distribution among sites. Traditional graph convolutional networks are implemented based on adjacency matrices generated based on prior knowledge, which cannot reflect the actual spatial dependence between sites. In order to solve these problems, this paper proposes an adaptive graph convolution network model (Adapt-GCN), which replaces the fixed adjacency matrix obtained from prior knowledge in traditional GCN with a trainable adaptive adjacency matrix. This not only effectively adjusts the correlation weights between adjacent sites, but also adaptively captures the spatial dependence between non-adjacent sites. This paper uses the Shanghai subway dataset to verify the effectiveness of this model in improving prediction accuracy and reducing training time.
introduce
Significance
With the rapid development of domestic intelligent transportation systems and subway transportation industries, smart IC cards have begun to become popular, resulting in a large amount of card data. On this basis, an accurate and effective passenger flow prediction model can be established to further perceive the changing trend of passenger flow, provide a strong decision-making basis for urban rail transit managers, better plan smoother travel routes, and choose more suitable travel times for passengers. , thereby avoiding or alleviating urban rail traffic congestion. Therefore, studying the rail transit passenger flow prediction problem has important theoretical significance and practical value.
Research review
There has been a lot of research work on the construction of rail transit passenger flow prediction models, which can be roughly divided into three categories. The first type is a model based on mathematical statistics, such as the autoregressive integrated moving average (ARIMA) model. This type of method only predicts the passenger flow of a single station, which makes the only factor that can be considered is the time factor and does not involve the traffic between stations. spatial dependence. The second category is based on non-parametric intelligent prediction models, such as the long short-term memory model (LSTM). This type of method only predicts the passenger flow in one direction of the inbound or outbound station, and does not achieve the simultaneous prediction of the inbound and outbound passenger flow. , which is very detrimental to the targeted safety deployment of urban rail transit management departments. The third category is a hybrid model based on multi-feature extraction, which uses a convolutional neural network (CNN) to learn spatial features from subway passenger flow image data, and uses bidirectional LSTM to extract temporal features from subway passenger flow time series, and finally fuses them through a fully connected network Spatiotemporal characteristics are used to obtain passenger flow prediction results. Although the addition of multi-source heterogeneous information such as weather and air conditions can improve the accuracy of the model, it is difficult to collect and process this information. Moreover, multi-source data is due to information redundancy and excessive model structure. Complexity will lead to inefficiency of the model.
Research contribution
This paper mainly studies the spatial feature mining of passenger flow changes in multi-line rail transit networks. The research idea is to achieve simultaneous prediction of passenger flows in and out of multiple stations based on the improved GCN model, and the model is required to be as simple and easy to operate as possible. The main contributions are as follows:
1. This paper studies the problem of subway passenger flow prediction, and proposes a new graph convolution module-Adaptive graph convolution Neural Network (Adapt-GCN), which uses adaptive technology to capture adjacent and non-linear Spatial correlation between adjacent sites.
2. Replace the fixed adjacency matrix in traditional GCN with a trainable adaptive adjacency matrix, and achieve short-term passenger flow prediction in subway traffic by overlaying multiple Adaptive-GCN layers and adding residual networks.
3. We conducted extensive experiments on real data sets. Experimental results show that our model consistently outperforms all baseline models.
problem definition
The urban rail transit passenger flow prediction problem is a typical spatio-temporal sequence prediction problem. The observation results of r time steps in history are used to predict the passenger flow in and out of each station in the future time step. This paper defines a city-wide urban rail transit network structure graph and uses graph convolutional neural network (GCN) to learn its spatial characteristics. Common spatial characteristics of urban rail transit networks include: due to the influence of population density and POI distribution, passenger flow changes at different stations have strong similarities; due to the influence of popular routes, the passenger flow direction interactions between different stations are obvious, such as Most of the inbound passengers at station a will exit from station b, or the outbound passengers at station a will mainly come from station b. Among them, GCN is usually used to extract the spatial features of rail transit networks.
Rail transit network diagram
In this study, we define the urban rail transit network as a graph. The rail transit network graph consists of nodes, edges and edge weights, usually represented by the graph G = (V; E; A), where V is the set of all stations, corresponding to the observation value of each subway station in the transit network, E is the set of edges, indicating the connectivity between sites, and A is the adjacency matrix, indicating whether the sites are adjacent.
From the above definition, it can be seen that the urban rail transit network graph G is a simple undirected graph, where A is an adjacency matrix with only 0 and 1 elements, and the diagonal elements are all 0. For a simple undirected graph with n vertices, the Laplacian matrix is defined as: L=D-A, where A is the adjacency matrix and D is the degree matrix of A
, and normalize the Laplacian matrix as follows:
Passenger flow characteristics
This article uses passenger flow time series data as the attribute characteristics of each station (node) in the subway network, expressed as, n represents the number of stations, r represents the number of attribute characteristics, That is, the window length of the time series, 2 represents the inbound and outbound passenger flow. As shown in Figure 1. In the figure, Tr represents the observed passenger flow of m stations at the r-th time step, Sm represents the m-th station, in_flow and out_flow represent the inbound and outbound passenger flows respectively:
Figure 1 Passenger flow characteristic diagram
Model algorithm
Graph Convolutional Network (GCN)
In this study, the authors used a simplified version of GCN. First, each graph convolution layer is set up using a first-order approximation to process the neighbor features of the adjacent layer, and then multiple graph convolution layers are superimposed according to the hierarchical propagation rules to achieve the effect of multi-layer neighbor feature propagation. The change effect of its receptive field is shown in Figure 2.
Figure 2 Changes in the receptive field of GCN
The original graph convolution formula is rescaled and dimensionally generalized, and the final graph convolution formula is obtained as follows:
is an identity matrix,
,
is the activation function. The above formula is the output of one layer of GCN. Two layers of stacked GCN can be expressed as:
Adaptive graph convolutional network
The traditional GCN calculation formula can be simplified to, where the activation function and the normalization of the adjacency matrix are omitted. From the simplified formula, traditional GCN divides the convolution process into two steps. The first step is to aggregate the characteristic information of itself and its adjacent nodes through
, which can be considered as an adjacency matrix. , the value range is 0 or 1. This aggregation is simply the sum of the feature information of itself and its neighboring nodes. The second step is the feature change of W, that is, mining the nonlinear features between nodes, so W can be called the feature change matrix. Taking the simple topological network in Figure 2 as an example, the figure shows the aggregation process in the traditional GCN formula, that is, a simple summation of the characteristic information of itself and its neighbors. As shown in Figure 3, the adjacency matrix of traditional GCN is fixed and remains unchanged throughout the training process of the neural network.
Figure 3 Traditional GCN
However, this article believes that the fixed adjacency matrix makes GCN very limited in aggregating node features. It can neither control the influence of itself and adjacent nodes on the results, nor can it learn the relationship between non-adjacent nodes other than itself and adjacent nodes. characteristic information. This article sets up a randomly initialized adjacency matrix, learns and optimizes it through neural network training, thereby constructing an adjacency matrix that is more suitable for the data itself. This adjacency matrix does not require prior knowledge and is designed to adaptively model and capture the hidden space. Relevance: The GCN model optimized by this method is called Adaptive graph convolutional network (Adapt-GCN for short). Calculated as follows:
In the formula, S is the randomly initialized adjacency matrix in Adapt-GCN, and Figure 4 shows the aggregation process of SX in Adapt-GCN. As can be seen from the figure, the adjacency matrix S in Adapt-GCN is no longer a fixed adjacency matrix, but a randomly initialized matrix with the same shape as the adjacency matrix. Through the training of neural networks, not only the correlation between adjacent nodes can be learned more freely, but also the correlation between non-adjacent nodes can be obtained through learning. By superimposing multiple Adapt-GCN layers and adding the residual structure, and finally using the fusion layer, a short-term prediction model Adapt-GCN for passenger flow in and out of multiple stations in this section of the subway transportation network is constructed. Its structure is shown in Figure 5.
Figure 4 Adapt-GCN
Figure 5 Adapt-GCN model structure diagram
experiment
data set
The passenger flow data in this article uses the Shanghai subway data set, which contains 288 subway stations. The topological distribution between stations is as follows:
Figure 6 Subway station distribution map
The data range is from July 1 to September 30, 2016, and only the inbound and outbound passenger flow from 5:30 to 23:45 every day is predicted. For each station, the number of people entering and exiting the station is counted every 15 minutes. A total of 73 time steps are generated in one day, and a total of 6716 pieces of data are generated in 92 days. All data are divided into training set, validation set and test set, with sizes of 62 days, 9 days and 21 days respectively.
Evaluation metrics and benchmark models
In order to judge the prediction effect of the model, the mean absolute error (MAE) and root mean square error (RMSE) are used to quantify the accuracy of the prediction results. The error is calculated as follows:
In the formula, is the real flow value of the i-th test sample,
is the predicted flow value of the i-th test sample, n is Total number of test samples. In order to verify the effectiveness of the model, four classic models, STGCN, ResNet, ResGCN, and JKResGCN, were selected for performance comparison with Adapt-GCN.
Forecast accuracy analysis
In the comparison model, ①STGCN adopts a sandwich structure of two one-dimensional time-gated convolutions sandwiched in the middle of a graph convolution layer to form a spatiotemporal convolution block, and builds a deep graph convolution network by stacking four spatiotemporal convolution blocks . Although this model can capture spatio-temporal features simultaneously, it uses the feature matrix output by CNN1D as the input of GCN, which results in insufficient capture of spatial features by GCN compared with mining spatial features directly from the original input data. ②The CNN structure in ResNet is suitable for ordinary raster data, which is obviously not suitable for general topological map data. ③In the process of superimposing multiple GCNs in ResNet, over-smoothing occurs, which excessively eliminates changes and details in the data. ④JK-ResGCN adds a skip knowledge network based on ResGCN, which effectively solves the problem of over-smoothing. However, the skip knowledge network puts the output of each GCN layer into the final fusion layer in a spliced manner, and as the GCN layers are stacked , the number of input channels of the fusion layer will increase, which makes the fusion layer spend more time on feature learning.
Table 1 Accuracy comparison
The impact of the number of GCN layers
By adjusting the number of stacking layers of GCN, the prediction accuracy (RMSE) and model training time of the four models of ResGCN, JKResGCN, ResNet and Adapt-GCN for total passenger flow were compared in detail. The comparison results of the four models are shown in the figure below. As shown in the figure, no matter how many GCN layers are stacked, the prediction accuracy of Adapt-GCN is optimal, especially in shallow networks. This is due to the presence of the adaptive adjacency matrix S in Adaptive-GCN, which allows the model to capture the spatial correlation between distant stations in shallow networks. However, as the number increases and the number of overlay layers increases, the receptive fields of the other three models gradually expand, and the accuracy advantage of Adapt-GCN becomes less and less obvious. In terms of training time complexity, the training time complexity of Adaptive-GCN is worse than that of ResNet based on CNN, because CNN can better perform parallel training. However, compared with ResGCN using a fixed adjacency matrix, the Adapt-GCN feature learning method becomes more flexible, and although the number of learning parameters increases, the overall training time decreases.
Figure 7 Impact of GCN layer
ablation experiment
To evaluate the effectiveness of the components in the model, we designed ablation experiments and tested variants of Adapt-GCN on the Shanghai subway dataset. Specifically, Adapt-GCN w/o residual refers to the removal of the residual connection block in the Adapt-GCN model. The following table shows the Adapt-GCN prediction results after removing residual connections. As can be seen from the table, on the Shanghai data set, Adapt-GCN is better than Adapt-GCN without residuals, indicating that residual connectivity has a positive effect on improving prediction performance.
Table 2 Ablation experiment
Adjacency matrix analysis
For the Shanghai subway dataset, the physical map (static adjacency matrix) describing the connectivity between stations is visualized separately from the adaptive adjacency matrix obtained by single-layer Adapt-GCN training. The comparison results are shown in the figure below: The left figure is a physical map with a size of 288×288 in the Shanghai subway data set, that is, a fixed adjacency matrix A, in which the dark area value is 1 and the bright area value is 0. The values in different areas represent the connectivity relationships between different sites. The picture on the right shows the adaptive adjacency matrix S obtained by neural network training. The darker the color in the picture, the stronger the correlation between the two stations, and the lighter the color, the weaker the correlation. Both the adaptive adjacency matrix S and the fixed adjacency matrix A have obvious diagonals, which represent the strong correlation between adjacent stations. In addition, the value range of the adaptive adjacency matrix S is wider than that of the fixed adjacency matrix A, which only has 0 and 1, which shows that the adaptive adjacency matrix S can learn the correlation between stations more flexibly. sex. The light-colored areas in the left image correspond to the non-zero elements in the right image, which shows that a single layer of Adapt-GCN can also capture the spatial correlation between distant stations.
Figure 8 Adjacency matrix comparison
in conclusion
This paper proposes an improved GCN network for the short-term passenger flow prediction problem of multiple stations in a multi-route rail transit network. Considering the topological structure in the spatial distribution among its sites, the fixed adjacency matrix in the traditional GCN is replaced with a random initialization matrix with the same shape and optimized through the backpropagation process of the neural network. Since this method is no longer constrained by a fixed adjacency matrix, but instead allows the GCN model to adaptively learn the topological relationship between stations from the data, the spatial correlation between distant stations can be fully captured. Experiments show that the model outperforms other existing methods on real-world datasets. On the other hand, the random initialization matrix no longer requires the real topological relationship between sites, but only the total number of sites, which eases the compilation of the adjacency matrix to a great extent. Although this model outperforms other methods, it does not consider the impact of differences in passenger flow patterns between weekdays and weekends on passenger flow prediction accuracy. In future work, we will pay more attention to mining time correlation and consider time periodicity. For example, the morning and evening peaks of each weekday are usually similar, and the passenger flow pattern of each weekend is also similar. We will consider time factors to further improve prediction accuracy.
Attention
If you are like me and work in the fields of rail transit, road transportation, and urban planning, you can add WeChat: Dr_JinleiZhang, remark "Join the group", and join the transportation big data exchange group! Hope we can make progress together!