Metro OD Flow Prediction Based on Adaptive Feature Fusion Network

1 Introduction

This paper will be published in IEEE Transactions on Intelligent Transportation Systems, the top journal in the field of transportation, in 2023. This paper aims at the three problems existing in OD flow prediction, that is, complex dynamic space-time dependence, data sparsity and incompleteness, and easy to be affected by external factors, and proposes an adaptive feature fusion network (AFFN). The spatial hidden features of OD flow are learned through multi-graph adaptive, and the periodic evolution mode of OD flow is accurately captured according to the influence of external factors. In order to solve the incompleteness and sparsity of the OD matrix, the article extends AFFN to a multi-task AFFN, and uses the inbound and outbound flow prediction of the subway station as a subtask to improve the accuracy of OD prediction. Experiments on two large-scale subway datasets in Nanjing and Xi'an fully validate the effectiveness of the model.

Article information

Authors: Yuhang Xu, Yan Lyu, Guangwei Xiong, Shuyu Wang

标题:Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Prediction in Metro Systems

Document source: IEEE Transactions on Intelligent Transportation Systems, 2023.

2 Summary

Accurately predicting OD passenger flow can improve the quality and efficiency of subway services. Existing works mainly study station-level inbound and outbound flow forecasting, and little attention is paid to OD forecasting for subway systems. The main challenges include: 1) the spatio-temporal dependence of complex dynamics; 2) the OD demand change is easily affected by external factors; 3) the OD demand matrix is ​​sparse and incomplete. In this article, the author proposes an adaptive feature fusion network (AFFN) for subway OD flow prediction, which adaptively captures the spatial dependence of the subway network by fusing multi-knowledge graphs; based on the self-learning influence of external factors, accurately captures Cyclical patterns of passenger flow. In order to solve the sparsity and incompleteness of OD matrix, the author extends AFFN to multi-task AFFN, and predicts the inbound and outbound flow of each station as a subtask to further improve the accuracy of OD prediction. The author conducted a large number of experiments on two real-world subway datasets in Nanjing and Xi'an. The results showed that AFFN and multi-task AFFN performed better than other baseline models; the results of ablation experiments showed that AFFN and its key components are effective in OD prediction. sex.

3 Introduction

The subway is one of the most popular and efficient modes of transportation in major cities. With the rapid growth of urbanization and population, the subway system is faced with highly dynamic travel demands. Therefore, it is necessary to optimize service operations in a timely manner, such as arranging flexible train schedules and planning flexible skipping routes, which requires accurate OD passenger flow forecasting. While OD prediction is widely studied in taxi or ride-hailing systems, i.e., predicting the number of taxi trips from each origin area to a target area. However, these techniques cannot be directly applied to subway OD demand forecasting, because the stations in the subway network are connected by sparse subway lines, while the road network is denser. Therefore, this article aims to study how to accurately predict city-level OD passenger flow in sparse subway networks. City-level subway OD prediction is full of challenges, mainly facing the following problems:

1) Spatio-temporal dependence of complex dynamics: OD flows in subway systems are highly dynamic, especially during rush hours. The number of OD pairs will change dynamically in the short term. In the spatial dimension, two stations with close distances may have similar temporal OD patterns, similar urban functions, or due to some other shared hidden features that cannot be clearly delineated. Therefore, capturing these complex spatial and temporal dependencies in a comprehensive and synchronized manner is crucial.

2) Periodic pattern and external factors: OD flow exhibits an obvious daily and weekly periodic pattern. At the same time, it is easily affected by external factors such as weather conditions and holidays, so that the OD flow fluctuates abnormally. Existing studies model the periodic pattern of OD flow and the influence of external factors separately, and cannot capture the influence of external factors on the periodic pattern of OD flow.

3) The incompleteness and sparseness of the OD matrix: subway travel is usually longer and takes a lot of time, that is, more than 30 minutes. We can only obtain complete OD information after the passenger arrives at the destination, but we cannot obtain the passenger's destination information in real time, so the real-time OD matrix lacks information on unfinished trips. In addition, the OD matrix is ​​usually very sparse, with a small number of site pairs covering most of the OD trips, and most of the site pairs have very few trips between them. This incomplete and sparse OD matrix increases the difficulty of prediction.

To overcome these challenges, we propose an Adaptive Feature Fusion Network (AFFN) to model spatially hidden features by adaptively fusing multiple knowledge graphs, and to mine periodic patterns of OD streams through self-learning influence from external factors. Specifically, the authors propose Enhanced Multi-Graph Convolutional Gated Recurrent Neural Network (EMGC-GRU), which encodes the spatial dependencies among sites using multiple knowledge-based graphs and an attention-based hidden correlation graph. Graph convolutions capture temporal dynamics in each GRU layer. Next, the periodic OD stream is weighted according to the attention weights learned by external factors and fused into real-time predictions via EMGC-GRU with a gating unit. In order to solve the incompleteness and sparsity of the OD matrix, the author extends AFFN to a multi-task AFFN, and takes the station inbound and outbound flow prediction as a secondary task. IO prediction is a much simpler task because the IO matrix is ​​denser and more complete, while being highly correlated with OD prediction. Therefore, the shared IO prediction network can help improve the accuracy of OD prediction. The main contributions of the article are as follows:

1) An Enhanced Multi-Graph Convolutional Gated Recurrent Unit (EMGC-GRU) is proposed to exhaustively capture the predefined spatial correlations in multiple knowledge-based graphs and automatically learn the hidden correlations between sites.

2) An attention module based on external factors is proposed to cooperatively integrate periodic data streams with attention weights to improve prediction accuracy.

3) An asymmetric multi-task adaptive feature fusion network (AFFN) is proposed, which uses task-shared IO encoders and task-shared attention based on external factors to jointly predict OD streams and IO streams to further improve OD prediction accuracy.

4) Extensive experiments on two large-scale datasets show that AFFN and multi-task AFFN are effective for OD passenger flow prediction in subway systems.

4 Preliminary knowledge

This chapter briefly introduces some main symbols and definitions in subway OD flow prediction, and the basic mathematical symbols used are listed in Table 1.

Table 1 Key mathematical expressions

cc967cee9fd7be1c3bb3df7a7bfabb10.png

4.1 Key concepts

1) Travel: A travel record includes the passenger's departure site, departure time period, destination station, and arrival time period. The article defines a passenger's trip behavior.

2) IO flow: The article defines the number of passengers entering a subway station as inbound flow, and the number of passengers leaving a subway station as outbound flow. and respectively represent the number of passengers entering or leaving the station at the time step, specifically expressed as: ; . The author uses the defined IO matrix to record the inbound and outbound passenger flow of all stations in the time period. As shown in Figure 1(a), each row of the IO matrix represents a station, the first column records the inbound flow, and the second column records the outbound flow. An IO stream is defined as a time series representing an IO matrix, ie.

3) OD flow: The author uses two matrices, which represent the number of passengers who leave the station and go to the station in the time period; represent the number of passengers who arrive at the station in the time period after leaving the station, specifically expressed as sum. As shown in Figure 1(b), the authors concatenate the two matrices together to represent an OD matrix, defined as , for capturing OD trips that depart and arrive at a station within a single time step. The OD flux is defined as, expressed as a sequence of time-varying OD matrices, ie.

4) External factors: Environmental factors, including weather conditions and air quality, will affect passengers' choice of transportation, thus affecting the IO flow and OD flow changes of the subway system. In addition, IO streams and OD streams exhibit different spatio-temporal patterns during holidays, weekends, and weekdays. Therefore, the authors consider these four external factors to improve the OD flow prediction accuracy.

f772a972b909988b35e2c6577b373bab.png

Figure 1 OD flow, IO matrix and OD matrix example

4.2 Problem Definition

Given historical OD flow, IO flow and external factors, we aim to predict the OD matrix of the next time step:

Problem 1 (OD flow prediction) : Given the historical OD flow, external factors, the goal of the article is to learn a prediction formula to accurately predict the OD matrix of the next time step, namely

Since IO flow is equal to the sum of OD flows of all target sites (inbound flow) and all departure sites (outbound flow), OD flow and IO flow are highly correlated. In addition, IO prediction is a relatively simple task because its input data has less sparsity than that of OD prediction. Therefore, the author hypothesizes that a neural network that can accurately predict IO flow can help OD flow prediction, and regards IO prediction as a sub-task of OD prediction:

Question 2 (mutual prediction) : Given the historical OD flow, IO flow and external factors, the goal of the article is to learn a prediction formula that can accurately predict the OD flow and IO flow of the next time step, namely:

5 models

In this study, the authors propose an Adaptive Feature Fusion Network (AFFN) for predicting OD flows between subway stations, as shown in Figure 2(a). AFFN first takes the real-time OD flow of historical time steps as input to predict an OD estimation matrix; then the periodic OD matrix sequence of the same time step in the past few days is integrated with external factors (weather conditions and date attributes) to calibrate the OD estimation matrix attention weight. Finally, a gating unit is used to output the final prediction result.

eac3aeb2611743838934cfffd38ab1aa.png

Figure 2 Adaptive Feature Fusion Network (AFFN) Framework

5.1 OD prediction based on real-time data stream

In this part, the author first introduces the basic module of the model for predicting the OD matrix of the next time step using the OD matrix of the historical time step (tq) to (t-1). To fully capture the spatio-temporal features of OD flows, the authors propose an Enhanced Multi-Graph Convolutional Gated Recurrent Unit (EMGC-GRU), which first constructs multiple knowledge-based graphs and adopts a Relational Graph Convolutional Network (RGCN) to integrate Multiple relationships between sites; Considering that some hidden relationships between sites cannot be directly mined through related knowledge, the author uses another graph attention network to capture hidden relationships between sites. Then, two convolutions based on knowledge graph and attention graph are integrated into two GRUs to capture the temporal dependencies of real-time OD flow. Finally, the output hidden states of the two GRUs are concatenated and passed through a fully connected layer to generate the final hidden state. The authors use two EMGC-GRU stacks as encoders for processing real-time OD streams, and employ other graph convolutional layers as decoders to output OD estimation matrices.

1) Knowledge graph-based spatio-temporal relationship representation learning : The author defines five knowledge graphs to represent different relationships between subway stations. Each node in the graph represents a subway station, and the edges represent topological connectivity, OD connectivity, regional functional similarity, inbound flow similarity, and outbound flow similarity.

(1) Topological map: used to model the physical topology of the subway system, where each node indicates whether two stations are adjacent and directly connected by a subway line. The author defines the weight matrix of the edge as a 0-1 matrix, and the value is 1 if two nodes are adjacent.

(2) OD graph: The cumulative number of passengers from one node (ie station) to another node within a period of time is used as the weight of the edge. The greater the weight from station to station, the greater the correlation between the two stations, indicating that there will be a large number of passengers going from station to station in the future.

(3) Functional similarity map: used to model the similarity between sites (commercial, residential, etc.) according to the function of the area where the site is located. Two sites with similar regional functions may have similar temporal evolution patterns of passenger flow. The article utilizes POI count vectors to represent the regional function of each site. The authors measure functional similarity between sites and between sites by computing the cosine similarity of two POI count vectors.

(4) Inbound flow similarity graph and outbound flow similarity graph: used to model the inbound flow similarity and outbound flow similarity between sites respectively. Given the inbound and outbound sequences of any two stations, the authors measure the station's inbound and outbound flow similarity by computing DWT.

Since graph convolutional networks (GCNs) only learn feature representations through a single graph, we use relational graph convolutional networks (RGCNs) to integrate multiple knowledge graphs to learn a unified representation. In RGCN, a node first aggregates over neighboring nodes in each graph, and then aggregates the aggregated node feature representations across multiple graphs. In each convolutional layer, we apply the weights of sites and sites on the knowledge graph for graph convolution operations, and sum the results across all knowledge graphs.

In order to integrate the site relevance and the temporal dependence of OD flow, the authors further utilize GRUs to iteratively update the hidden states of the sites. Assuming that at time step, denote the hidden feature representations of all sites with features in the last layer of the RGCN. , and denote reset gates, update gates, and candidate activations in the GRU. The hidden feature representation is updated by:

where, represents the OD matrix of the time step input, and represents the enhanced hidden feature representation in the time step.

2) Spatio-temporal relationship representation learning based on graph attention network : Five predefined knowledge graphs may not be enough to capture all possible correlations between sites, and there may be hidden correlations that cannot be represented explicitly. Therefore, the authors employ a graph attention network (GAT) to automatically learn hidden correlations. Unlike RGCN, GAT can automatically learn important relationships between nodes without requiring a predefined graph structure. The author adopts the most general network structure, assumes that there is an edge between every two nodes and automatically learns the weight of the edge to capture the hidden dependencies. Similar to RGCN-GRU, the authors use GRUs to iteratively update the hidden feature representation learned by GAT. GAT-GRU finally outputs the hidden features of the time step, and splices the features output by RGCN-GRU.

3) Aggregate knowledge-based and attention-based feature learning : The author splices the hidden feature representations learned by RGCN-GRU and GAT-GRU, and uses the fully connected layer to generate the final hidden state of the time step, namely:

Figure 2(b) illustrates how the features obtained by knowledge graph convolution and attention graph convolution are fused and updated. Since GAT-GRU improves the comprehensiveness from RGCN-GRU representation, the whole unit is called Enhanced Multi-Graph Convolutional Gated Recurrent Unit (EMGC-GRU).

4) Prediction framework: The author stacked two layers of EMGC-GRU as an encoder, and used a GCN as a decoder to output the most original prediction. The first layer sequentially takes the OD matrix of each time step as input, and the hidden state it outputs is then input to the second layer for higher-dimensional feature learning. The decoder GCN decodes the time-step hidden features into an initial prediction, which is further calibrated using periodic OD flows and extrinsic factors.

5.2 Integrating periodic OD flow and external factors

OD flows usually have a very obvious periodic pattern, such as morning and evening peaks, which can improve prediction accuracy. However, passenger flow often fluctuates erratically due to external factors such as weekends, holidays and weather conditions. As shown in Figure 3(c), the number of ODs in the same time period of each weekend is very close, and the number of ODs during holidays is generally smaller. Therefore, the article hopes to improve the prediction accuracy by integrating external factors and periodic OD matrix.

ba13de4084d3c61ae0298463400ac828.png

Figure 3 Hourly OD trips between two Nanjing subway stations from April 1 to May 10, 2014

1) Attention module based on external factors : The author adopts an attention mechanism to model how external factors affect the number of OD pairs at different time periods. Specifically, in order to calibrate the OD estimation matrix at the previous time step, the authors considered the periodic OD matrix at the same time step in the past days, viz. Since the OD flow of days affected by the same external factor is usually relatively similar, the authors calculated the attention of the external factor at the current time step to the external factor at the same time step in the past, defined as . The computed attention weights are then applied to each OD period matrix, and the weighted OD matrices of past days are aggregated as calibration predictions. Figure 4 shows a more detailed calculation process, and interested readers can read the original text.

2) Calibrated prediction based on gating : For the passenger flow with obvious periodic characteristics, using periodic flow for correction can improve the prediction accuracy. However, predictions directly using real-time data streams may be more accurate for data with weak periodic patterns. Therefore, the authors used a gating unit to automatically learn how much cycle information or real-time OD flow information should be used. The final prediction result is obtained after weighing the two prediction results according to a trainable gating weight matrix, namely

8aeedf7c10b016c99ea9c2419516cf1d.png

Figure 4 Attention module based on external factors

5.3 Loss function

The authors define a loss function that minimizes the error between the predicted value and the true value. Metro operators usually pay more attention to stations with high OD requirements. In order to focus on these OD pairs, the author defines a mask operation to mask the OD requirements less than a certain threshold, and only focus on OD pairs with high OD requirements.

It should be noted that when, and denote the indices of the start site and target site of the OD matrix, respectively, when, and denote the start site and target site, respectively (Fig. 1(b)).

6 Multitasking Networks

After integrating periodic patterns and external factors, it is still a challenging task to achieve real-time OD prediction with high accuracy, as follows: 1) Sparse OD matrix. Xi'an contains a total of 160 stations, and the density of its OD matrix is ​​only 13.27%, that is, only 13.27% of the OD pairs have no passenger flow. 2) Incomplete data. Passengers typically have long journeys that span multiple time steps. We can only get the complete OD travel information when the passenger completes the trip, and we cannot know it while the passenger is traveling. Therefore the real-time OD matrix lacks outstanding trips. On the contrary, the IO matrix is ​​more dense and complete, and it is proved to have higher prediction accuracy. The IO matrix is ​​actually the sum of the OD matrix requirements of each origin station and destination station (Fig. 1(b)), and the authors postulate that a network that can accurately predict IO flows can help predict the OD matrix. Therefore, the authors propose a multi-task network for mutually predicting IO and OD flows.

6.1 Multi-task network framework

Figure 5 depicts the architecture of the multi-task network, which consists of two Adaptive Feature Fusion Networks (AFFNs) for IO and OD prediction, respectively. Similar to the single-task OD prediction in Figure 2(a), each AFFN module first takes a sequence of real-time IO(OD) matrices as input, and uses EMGC-GRU to learn a task-related feature representation, which is defined as IO(OD) Encoder. The IO(OD) predictor leverages a GCN layer to decode the feature representation into a predictive estimate matrix or . Next, the predictive estimation matrix is ​​calibrated by the attention model and gating unit using the periodic IO (OD) matrix and external factors. The network finally outputs IO prediction and OD prediction.

In order to improve the accuracy of OD prediction, the author built another Co-IO encoder module composed of EMGC-GRU. This module also takes the real-time IO data stream as input, but the output is the feature representation shared by the IO predictor and the OD predictor, so that the relevant inbound and outbound flow data can be fused with the OD matrix to improve the prediction performance of the OD predictor. These two tasks share attention-based extrinsic factors, i.e., the attention weights are updated based on the two prediction tasks to capture the passenger flow periodicity shared by IO flow and OD flow.

Figure 5 Multi-task adaptive feature fusion network

6.2 Loss function

The author first defines two task-independent loss functions to minimize the error between the predicted value and the real, namely:

Note that the IO passenger flow is actually the sum of the OD flow at the starting station or the destination station, and maintaining such a relationship in the mutual prediction will help improve the prediction accuracy. In order to achieve this, the author defines a cross-task loss function to minimize the error between the sum of each destination flow of the OD matrix and the actual inbound flow of each origin site, and also calculates the error with the outbound flow, namely :

The loss function of the final multi-task AFFN is to minimize the weighted sum of all loss items, namely:

where and denote the error weights of two independent tasks, and are the error weights across tasks.

7 Experimental Discussion

In the experimental part, the author conducted various experiments on two subway data sets in Nanjing and Xi'an to verify the prediction accuracy of the model and the reliability of each module, including the comparison of the prediction effect of single-task AFFN and multi-task AFFN, and the model cycle flow. Ablation experiments of data and external factors, comparison of model prediction effects in different types of subway networks, visualization analysis of prediction results, and analysis of model algorithm operating efficiency. The specific discussion article of the experimental results will not be introduced in detail, and interested readers can read the original text for learning.

8 summary

This paper proposes an Adaptive Feature Fusion Network (AFFN) for predicting OD flows in urban rail transit systems. To exhaustively capture the complex spatiotemporal dependencies in OD flows, we first propose an Enhanced Multi-Graph Convolutional Gated Recurrent Unit (EMGC-GRU) to fuse hidden correlations between sites. The authors additionally propose an attention module based on external factors to accurately capture periodic features. In order to further improve the prediction accuracy, the author proposes an asymmetric multi-task framework to predict OD flow and IO flow mutually. The evaluation results show that the authors' proposed method outperforms other baseline models.

The future research directions are mainly as follows: 1) Extend the single-step forecasting model to multi-step forecasting; 2) Predict finer-grained passenger flow by fusing more detailed travel information; 3) Study how to apply the model to more complex subways Lines, such as ring roads, etc.; 4) Improve prediction accuracy by studying other transportation modes, such as buses and taxis.

9 Attention

If you are in the field of rail transit, road traffic, and urban planning like me, you can add WeChat: Dr_JinleiZhang, remark "Join the group", and join the traffic big data exchange group! Hope we make progress together!

Guess you like

Origin blog.csdn.net/zuiyishihefang/article/details/130716905