AAAI 2021 Paper: Gated Memory Neural Networks

A multidimensional time series is composed of multiple related variables that evolve over time. This data structure widely exists in scientific research and real-world application scenarios. For example, in an e-commerce scenario, the sales of multiple types of products change over time to form a set of multidimensional time series; in the financial stock market, the prices of multiple stocks form a set of multidimensional time series. Extracting information from such data structures and making analysis and predictions is especially important in the current era of big data.

Among machine learning methods, recurrent neural networks (RNNs) are an important class of models for analyzing multidimensional time series. Its main feature is the ability to extract information about data in chronological order and store it as memory in a neural network. This type of model (especially the variants LSTM and GRU that contain gated variables) has achieved great success in speech recognition, dynamic image processing, weather prediction, financial data analysis, and other fields.

In the AAAI 2021 (CCF-A category) conference included in December 2020, we were included in a paper called "Memory-Gated RecurrentNetworks". For the information structure of multi-dimensional time series, this paper makes improvements on the basis of the existing recurrent neural network model, and improves the ability of the model to extract multi-dimensional information.

1. Research motivation

The difficulty and essence of multidimensional time series information extraction lies in extracting complex interdependence in data. In this data structure, the evolution of each variable not only depends on its own historical information (that is, the marginal memories of each variable's own temporal memory), but also depends on the interaction between variables (that is, the temporal memory of the interaction between variables the joint memory). Let’s still take the sales of e-commerce products as an example. Changes in sales of a certain type of product are not only affected by its own seasonality and other factors, but also have a strong correlation with sales of other product categories due to factors such as holidays.

Although this informative feature of multidimensional time series is reflected in classical statistical models (such as ARMA-GARCH), it is not exploited by existing machine learning methods. For example, when using LSTM to predict product sales, we input the sales of different products into the network without distinction, and fully expect the network itself to distinguish the complex dependencies. Such an operation is very crude. We can design a refined structure in the neural network to extract these two types of memory in the multidimensional time series, that is, the marginal memories of each variable itself and the joint memory between variables, thereby reducing the difficulty of extracting multidimensional information by the neural network. Based on this idea, we propose a new recurrent neural network structure, named as Memory-Gated Recurrent Networks (mGRN)

2. Model structure

insert image description here

Figure 1: Example mGRN structure

Next we introduce the structure of mGRN. We denote a set of multidimensional time series as X_t, assuming it consists of M variables. We divide the variables into K groups, that is, X_t=[X_t ((1))…X_t ((K))]. In mGRN, we set Marginal-memory components to extract the memory information of each variable group (that is, the red part in Figure 1), and then integrate the information of each variable group in the joint-memory component to extract the interaction ( That is the blue part in Figure 1).

Among them, Marginal-memory components are designed in the form of GRU. Specifically, the part used to extract the information of the kth variable group X_t^((k)) is shown in Formula 1 (σ represents the Sigmoid function, and the picture represents the product of elements). The focus of the structural design here is that we clearly correspond to the data information X_t^((k)) and its corresponding memory h_t^((k)), thus simplifying the difficulty of distinguishing and extracting information by the neural network. This clear correspondence is missing in existing recurrent neural networks.

insert image description here

Formula 1: Marginal-memory components

Then we combine the information of each variable group in the joint-memorycomponent in a non-linear way. The specific form is shown in formula 2. This form is a simplified version of GRU. In mGRN, since marginal memory and joint memory need to be extracted separately, we inevitably set a large number of intermediate gating variables. However, too many intermediate variables can easily lead to overfitting. In order to solve this problem, we adopted a careful design, using GRU instead of LSTM as the basic structure (the GRU structure is simpler than LSTM), and removed the redundant part of the joint-memory component through experiments.

insert image description here

Formula 2: The joint-memory component

Finally, there are two more points to discuss about the structure of mGRN.

  1. mGRN extracts information by grouping variables in a multidimensional time series. How to group variables can be part of parameter tuning. We have noticed in experiments that grouping each variable individually often works well.
  2. In the current paper, we deliberately simplify the model structure to show
    the improvement that can be brought about by extracting marginal memory and joint memory separately. This model can be easily combined with other structures (such as CNN and attention structure, etc.) to achieve better results.

3. Application

mGRN can be applied to all multidimensional time series data analysis. In order to demonstrate the improvement of mGRN, we provide comparative experiments in multiple real application scenarios in the paper, including

  1. Forecasting based on multidimensional time series composed of physical indicators of patients in the intensive care unit (Harutyunyan et al.
    2019). Predicted goals include patient survival, length of stay in the ICU, and more.

  2. Spoken digit pronunciation recognition (Bagnall et al. 2018). A multidimensional time series consists of multiple frequencies of sound recordings.

  3. Recognition of handwritten digits (Bagnall et al. 2018). The multidimensional time series consists of coordinate changes of handwritten trajectories.

In these applications, the best results in the existing literature are compared. Both mGRNs have achieved significant improvements. But these experiments are more complicated, interested readers can refer to our paper. Here, we provide an application in a financial scenario, namely high-frequency stock price prediction based on limit order book data.

insert image description here

Figure 2: Limit Order Book Illustration

The limit order book is a common mechanism in the stock market. Figure 2 shows the state of the limit order book at a certain moment. The red number in the middle records the price that the market is willing to buy and sell, and the last white number records the market price. The number of shares willing to buy or sell. In the real trading state, these numbers will change continuously with the submission and transaction of orders, thus forming a high-frequency multidimensional time data set. This data set contains information such as the supply and demand of stocks, based on which we can make certain predictions about stock price changes in the short term in the future.

Referring to Sirignano and Cont (2019), we make predictions based on historical order book data at each time point, and the prediction content is the direction of future stock price changes (ie, rising or falling), so that we simplify the stock price prediction into a binary classification problem. When applying mGRN, we divide the order book data into four groups for processing, namely bid price, buy volume, ask price, and sell volume.

We conduct experiments in the domestic A-share market. Our dataset spans from December 2014 to December 2017. We use the data from December 2014 to June 2017 to train the model, the data from July 2017 to September 2017 to adjust the parameters (validation set), and finally use the data from October 2017 to December 2017 to compare the predictions Results (test set). In order to obtain a representative conclusion, we focus on the stocks in the CSI300 and CSI500 indexes, and remove the stocks that have been suspended for a long time, and get about 300 stocks. Finally, we randomly selected 30 stocks from them for experimentation. In the whole dataset, each stock has about 4 million sample points.

insert image description here

Table 1: Average Forecast Results for 30 Stocks

We measure the prediction effect by two indicators of prediction accuracy and AUC. The average forecast results of the 30 stocks from October 2017 to December 2017 are shown in Table 1. On each stock, the improvement of mGRN over LSTM and GRU is shown in the box plot in Figure 3. It can be seen that mGRN has a significant and stable improvement in the prediction effect compared with LSTM and GRU.

insert image description here

Figure 3: Improvement of mGRN in stock price change prediction accuracy (left) and AUC (right) compared to LSTM and GRU


We propose a new recurrent neural network for multidimensional time series, namely Memory-Gated Recurrent Networks (mGRN). The main feature of this structure is to extract the time series memory of each variable (group) in the multidimensional time series and the time series memory of the interaction between variables. By explicitly setting gating variables to learn both types of memories, we reduce the difficulty for neural networks to extract high-dimensional memories. Compared with existing machine learning algorithms for processing high-dimensional time series, gated memory recurrent neural networks have shown significant and comprehensive improvements in multiple application scenarios.


References

[1] Zhang, Y.; Wu, Q.;Peng N.; Dai, M.; Zhang, J.; Wang, H. (2021). The Thirty-Fifth AAAI Conferenceon Artificial Intelligence (AAAI-21), arXiv preprint arXiv:2012.13121

[2] Bagnall, A.; Dau,H. A.; Lines, J.; Flynn, M.; Large, J.; Bostrom, A.; Southam, P.; and Keogh, E.(2018). The UEA multivariate time series classification archive, arXiv preprintarXiv:1811.00075.

[3] Harutyunyan, H.;Khachatrian, H.; Kale, D. C.; Ver Steeg, G.; and Galstyan, A. (2019). Multitasklearning and benchmarking with clinical time series data. Scientific data 6(1):1–18.

[4] Sirignano,J. and Cont, R. (2019). Universal features of price formation in financialmarkets: perspectives from deep learning. Quantitative Finance, pages 1-11.


Follow the "Jingdong Technology Talk" WeChat official account and reply "AAAI 2021" to download more papers

insert image description here

Guess you like

Origin blog.csdn.net/JDDTechTalk/article/details/114038602