Short-term passenger flow prediction for rail transit based on adaptive graph convolution network

  •  Article information

The title of the paper is "Adaptive graph convolutional network-based short-term passenger flow prediction for metro", an article on rail transit short-term passenger flow prediction based on adaptive graph convolutional network published in the Journal of Intelligent Transportation Systems in May 2023.

  •  Summary

With the development and acceleration of urbanization, the urban rail transit system has gradually grown into a large network, and the topology between stations has become more and more complex, making it increasingly difficult to capture spatial dependence. The crisscrossing of multiple lines creates a topological distribution among sites. Traditional graph convolutional networks are implemented based on adjacency matrices generated based on prior knowledge, which cannot reflect the actual spatial dependence between sites. In order to solve these problems, this paper proposes an adaptive graph convolution network model (Adapt-GCN), which replaces the fixed adjacency matrix obtained from prior knowledge in traditional GCN with a trainable adaptive adjacency matrix. This not only effectively adjusts the correlation weights between adjacent sites, but also adaptively captures the spatial dependence between non-adjacent sites. This paper uses the Shanghai subway dataset to verify the effectiveness of this model in improving prediction accuracy and reducing training time.

  • introduce

  •  Significance

With the rapid development of domestic intelligent transportation systems and subway transportation industries, smart IC cards have begun to become popular, resulting in a large amount of card data. On this basis, an accurate and effective passenger flow prediction model can be established to further perceive the changing trend of passenger flow, provide a strong decision-making basis for urban rail transit managers, better plan smoother travel routes, and choose more suitable travel times for passengers. , thereby avoiding or alleviating urban rail traffic congestion. Therefore, studying the rail transit passenger flow prediction problem has important theoretical significance and practical value.

  • Research review

There has been a lot of research work on the construction of rail transit passenger flow prediction models, which can be roughly divided into three categories. The first type is a model based on mathematical statistics, such as the autoregressive integrated moving average (ARIMA) model. This type of method only predicts the passenger flow of a single station, which makes the only factor that can be considered is the time factor and does not involve the traffic between stations. spatial dependence. The second category is based on non-parametric intelligent prediction models, such as the long short-term memory model (LSTM). This type of method only predicts the passenger flow in one direction of the inbound or outbound station, and does not achieve the simultaneous prediction of the inbound and outbound passenger flow. , which is very detrimental to the targeted safety deployment of urban rail transit management departments. The third category is a hybrid model based on multi-feature extraction, which uses a convolutional neural network (CNN) to learn spatial features from subway passenger flow image data, and uses bidirectional LSTM to extract temporal features from subway passenger flow time series, and finally fuses them through a fully connected network Spatiotemporal characteristics are used to obtain passenger flow prediction results. Although the addition of multi-source heterogeneous information such as weather and air conditions can improve the accuracy of the model, it is difficult to collect and process this information. Moreover, multi-source data is due to information redundancy and excessive model structure. Complexity will lead to inefficiency of the model.

  •  Research contribution

This paper mainly studies the spatial feature mining of passenger flow changes in multi-line rail transit networks. The research idea is to achieve simultaneous prediction of passenger flows in and out of multiple stations based on the improved GCN model, and the model is required to be as simple and easy to operate as possible. The main contributions are as follows:

1. This paper studies the problem of subway passenger flow prediction, and proposes a new graph convolution module-Adaptive graph convolution Neural Network (Adapt-GCN), which uses adaptive technology to capture adjacent and non-linear Spatial correlation between adjacent sites.

2. Replace the fixed adjacency matrix in traditional GCN with a trainable adaptive adjacency matrix, and achieve short-term passenger flow prediction in subway traffic by overlaying multiple Adaptive-GCN layers and adding residual networks.

3. We conducted extensive experiments on real data sets. Experimental results show that our model consistently outperforms all baseline models.

  • problem definition

The urban rail transit passenger flow prediction problem is a typical spatio-temporal sequence prediction problem. The observation results of r time steps in history are used to predict the passenger flow in and out of each station in the future time step. This paper defines a city-wide urban rail transit network structure graph and uses graph convolutional neural network (GCN) to learn its spatial characteristics. Common spatial characteristics of urban rail transit networks include: due to the influence of population density and POI distribution, passenger flow changes at different stations have strong similarities; due to the influence of popular routes, the passenger flow direction interactions between different stations are obvious, such as Most of the inbound passengers at station a will exit from station b, or the outbound passengers at station a will mainly come from station b. Among them, GCN is usually used to extract the spatial features of rail transit networks.

  •  Rail transit network diagram

In this study, we define the urban rail transit network as a graph. The rail transit network graph consists of nodes, edges and edge weights, usually represented by the graph G = (V; E; A), where V is the set of all stations, corresponding to the observation value of each subway station in the transit network, E is the set of edges, indicating the connectivity between sites, and A is the adjacency matrix, indicating whether the sites are adjacent.

From the above definition, it can be seen that the urban rail transit network graph G is a simple undirected graph, where A is an adjacency matrix with only 0 and 1 elements, and the diagonal elements are all 0. For a simple undirected graph with n vertices, the Laplacian matrix62cbaf6f8f54cf5d5d9beab12cc862bf.png is defined as: L=D-A, where A is the adjacency matrix and D is the degree matrix of A9dbefb3df3a5a756d71eb3a4cee1d6c9.png, and normalize the Laplacian matrix as follows:

bbd3ddd3db73303d706edf47a89dcfa4.png

  • Passenger flow characteristics

This article uses passenger flow time series data as the attribute characteristics of each station (node) in the subway network, expressed asba367e47ffeaa76c8e0965a0427fbb73.png, n represents the number of stations, r represents the number of attribute characteristics, That is, the window length of the time series, 2 represents the inbound and outbound passenger flow. As shown in Figure 1. In the figure, Tr represents the observed passenger flow of m stations at the r-th time step, Sm represents the m-th station, in_flow and out_flow represent the inbound and outbound passenger flows respectively:

19dccc7869d0a9f5496a8e5644202d68.png

Figure 1 Passenger flow characteristic diagram

  •  Model algorithm

  • Graph Convolutional Network (GCN)

In this study, the authors used a simplified version of GCN. First, each graph convolution layer is set up using a first-order approximation to process the neighbor features of the adjacent layer, and then multiple graph convolution layers are superimposed according to the hierarchical propagation rules to achieve the effect of multi-layer neighbor feature propagation. The change effect of its receptive field is shown in Figure 2.

2c7ec875142a51a224d1c3995660d559.png

Figure 2 Changes in the receptive field of GCN

The original graph convolution formula is rescaled and dimensionally generalized, and the final graph convolution formula is obtained as follows:

3f6575efeb76a6c9dcf7bf76dafd24d4.png

4043235fc4cd74b82391c3df414ee3ce.png is an identity matrix, cf59d9aa4f5030e41ac8b08dd1c2e111.png, 56253f577dce77b96877a7f41f7d00e3.png is the activation function. The above formula is the output of one layer of GCN. Two layers of stacked GCN can be expressed as:

26d984b7249a64311a92f73be72bed35.png

  • Adaptive graph convolutional network

The traditional GCN calculation formula can be simplified to22f2b6cadeb81f2bd86381eb75c7c4b0.png, where the activation function and the normalization of the adjacency matrix are omitted. From the simplified formula, traditional GCN divides the convolution process into two steps. The first step is to aggregate the characteristic information of itself and its adjacent nodes through0f3fccdc73fe6b734516233e710d3ba4.png, which can be considered as an adjacency matrix. , the value range is 0 or 1. This aggregation is simply the sum of the feature information of itself and its neighboring nodes. The second step is the feature change of W, that is, mining the nonlinear features between nodes, so W can be called the feature change matrix. Taking the simple topological network in Figure 2 as an example, the figure shows the aggregation process in the traditional GCN formula, that is, a simple summation of the characteristic information of itself and its neighbors. As shown in Figure 3, the adjacency matrix of traditional GCN is fixed and remains unchanged throughout the training process of the neural network. 5a8fb551378ed57d5f0a418b0f2e31f5.png

5c690e90a69ae737fec92cb782ee91e6.png

Figure 3 Traditional GCN

However, this article believes that the fixed adjacency matrix makes GCN very limited in aggregating node features. It can neither control the influence of itself and adjacent nodes on the results, nor can it learn the relationship between non-adjacent nodes other than itself and adjacent nodes. characteristic information. This article sets up a randomly initialized adjacency matrix, learns and optimizes it through neural network training, thereby constructing an adjacency matrix that is more suitable for the data itself. This adjacency matrix does not require prior knowledge and is designed to adaptively model and capture the hidden space. Relevance: The GCN model optimized by this method is called Adaptive graph convolutional network (Adapt-GCN for short). Calculated as follows:

c3bbc3b4c1c8b8426c25491857bb3e56.png

In the formula, S is the randomly initialized adjacency matrix in Adapt-GCN, and Figure 4 shows the aggregation process of SX in Adapt-GCN. As can be seen from the figure, the adjacency matrix S in Adapt-GCN is no longer a fixed adjacency matrix, but a randomly initialized matrix with the same shape as the adjacency matrix. Through the training of neural networks, not only the correlation between adjacent nodes can be learned more freely, but also the correlation between non-adjacent nodes can be obtained through learning. By superimposing multiple Adapt-GCN layers and adding the residual structure, and finally using the fusion layer, a short-term prediction model Adapt-GCN for passenger flow in and out of multiple stations in this section of the subway transportation network is constructed. Its structure is shown in Figure 5.

86864bf4777c70a41d6af1b36d08b853.png

Figure 4 Adapt-GCN

50645b2e0fc5a68420049150903fcc0f.png

Figure 5 Adapt-GCN model structure diagram

  • experiment

  • data set

The passenger flow data in this article uses the Shanghai subway data set, which contains 288 subway stations. The topological distribution between stations is as follows:

e483d7f09dc6a036274f8a7e4ff0c110.png

Figure 6 Subway station distribution map

The data range is from July 1 to September 30, 2016, and only the inbound and outbound passenger flow from 5:30 to 23:45 every day is predicted. For each station, the number of people entering and exiting the station is counted every 15 minutes. A total of 73 time steps are generated in one day, and a total of 6716 pieces of data are generated in 92 days. All data are divided into training set, validation set and test set, with sizes of 62 days, 9 days and 21 days respectively.

  • Evaluation metrics and benchmark models

In order to judge the prediction effect of the model, the mean absolute error (MAE) and root mean square error (RMSE) are used to quantify the accuracy of the prediction results. The error is calculated as follows:

43caa9f7e558e8a2ddd8ddf73ebb338d.png

In the formula, ff11b522999a3e24f50e5a13b633628a.png is the real flow value of the i-th test sample, 0e773b74fc9456e897b41ec6ac1102ee.png is the predicted flow value of the i-th test sample, n is Total number of test samples. In order to verify the effectiveness of the model, four classic models, STGCN, ResNet, ResGCN, and JKResGCN, were selected for performance comparison with Adapt-GCN.

  • Forecast accuracy analysis

In the comparison model, ①STGCN adopts a sandwich structure of two one-dimensional time-gated convolutions sandwiched in the middle of a graph convolution layer to form a spatiotemporal convolution block, and builds a deep graph convolution network by stacking four spatiotemporal convolution blocks . Although this model can capture spatio-temporal features simultaneously, it uses the feature matrix output by CNN1D as the input of GCN, which results in insufficient capture of spatial features by GCN compared with mining spatial features directly from the original input data. ②The CNN structure in ResNet is suitable for ordinary raster data, which is obviously not suitable for general topological map data. ③In the process of superimposing multiple GCNs in ResNet, over-smoothing occurs, which excessively eliminates changes and details in the data. ④JK-ResGCN adds a skip knowledge network based on ResGCN, which effectively solves the problem of over-smoothing. However, the skip knowledge network puts the output of each GCN layer into the final fusion layer in a spliced ​​manner, and as the GCN layers are stacked , the number of input channels of the fusion layer will increase, which makes the fusion layer spend more time on feature learning.

Table 1 Accuracy comparison

fb72167dc449e6ae5ebd179338bc3821.png

  • The impact of the number of GCN layers

By adjusting the number of stacking layers of GCN, the prediction accuracy (RMSE) and model training time of the four models of ResGCN, JKResGCN, ResNet and Adapt-GCN for total passenger flow were compared in detail. The comparison results of the four models are shown in the figure below. As shown in the figure, no matter how many GCN layers are stacked, the prediction accuracy of Adapt-GCN is optimal, especially in shallow networks. This is due to the presence of the adaptive adjacency matrix S in Adaptive-GCN, which allows the model to capture the spatial correlation between distant stations in shallow networks. However, as the number increases and the number of overlay layers increases, the receptive fields of the other three models gradually expand, and the accuracy advantage of Adapt-GCN becomes less and less obvious. In terms of training time complexity, the training time complexity of Adaptive-GCN is worse than that of ResNet based on CNN, because CNN can better perform parallel training. However, compared with ResGCN using a fixed adjacency matrix, the Adapt-GCN feature learning method becomes more flexible, and although the number of learning parameters increases, the overall training time decreases.

7961128babe3bc91a78985030d1d2c5d.png8276a48e23093aa837752591d480e44b.png

Figure 7 Impact of GCN layer

  • ablation experiment

To evaluate the effectiveness of the components in the model, we designed ablation experiments and tested variants of Adapt-GCN on the Shanghai subway dataset. Specifically, Adapt-GCN w/o residual refers to the removal of the residual connection block in the Adapt-GCN model. The following table shows the Adapt-GCN prediction results after removing residual connections. As can be seen from the table, on the Shanghai data set, Adapt-GCN is better than Adapt-GCN without residuals, indicating that residual connectivity has a positive effect on improving prediction performance.

Table 2 Ablation experiment

c9bdaaff275d46f25c4c6d5024020a40.png

  • Adjacency matrix analysis

For the Shanghai subway dataset, the physical map (static adjacency matrix) describing the connectivity between stations is visualized separately from the adaptive adjacency matrix obtained by single-layer Adapt-GCN training. The comparison results are shown in the figure below: The left figure is a physical map with a size of 288×288 in the Shanghai subway data set, that is, a fixed adjacency matrix A, in which the dark area value is 1 and the bright area value is 0. The values ​​in different areas represent the connectivity relationships between different sites. The picture on the right shows the adaptive adjacency matrix S obtained by neural network training. The darker the color in the picture, the stronger the correlation between the two stations, and the lighter the color, the weaker the correlation. Both the adaptive adjacency matrix S and the fixed adjacency matrix A have obvious diagonals, which represent the strong correlation between adjacent stations. In addition, the value range of the adaptive adjacency matrix S is wider than that of the fixed adjacency matrix A, which only has 0 and 1, which shows that the adaptive adjacency matrix S can learn the correlation between stations more flexibly. sex. The light-colored areas in the left image correspond to the non-zero elements in the right image, which shows that a single layer of Adapt-GCN can also capture the spatial correlation between distant stations.

8c53c55715086afd5806c2ef2bfbed25.png432870c1b7b06b6d77de159c1497ddce.png

Figure 8 Adjacency matrix comparison

  • in conclusion

This paper proposes an improved GCN network for the short-term passenger flow prediction problem of multiple stations in a multi-route rail transit network. Considering the topological structure in the spatial distribution among its sites, the fixed adjacency matrix in the traditional GCN is replaced with a random initialization matrix with the same shape and optimized through the backpropagation process of the neural network. Since this method is no longer constrained by a fixed adjacency matrix, but instead allows the GCN model to adaptively learn the topological relationship between stations from the data, the spatial correlation between distant stations can be fully captured. Experiments show that the model outperforms other existing methods on real-world datasets. On the other hand, the random initialization matrix no longer requires the real topological relationship between sites, but only the total number of sites, which eases the compilation of the adjacency matrix to a great extent. Although this model outperforms other methods, it does not consider the impact of differences in passenger flow patterns between weekdays and weekends on passenger flow prediction accuracy. In future work, we will pay more attention to mining time correlation and consider time periodicity. For example, the morning and evening peaks of each weekday are usually similar, and the passenger flow pattern of each weekend is also similar. We will consider time factors to further improve prediction accuracy.

  • Attention

If you are like me and work in the fields of rail transit, road transportation, and urban planning, you can add WeChat: Dr_JinleiZhang, remark "Join the group", and join the transportation big data exchange group! Hope we can make progress together!

Guess you like

Origin blog.csdn.net/zuiyishihefang/article/details/133781846