【PyTorch】LSTM+注意力机制(Attention)实现时间序列预测

企业开发 2024-11-06 18:00:39 阅读次数: 0

LSTM+注意力机制(Attention)实现时间序列预测(PyTorch版)

介绍

长短期记忆网络（LSTM）是一种特殊的循环神经网络（RNN），能够捕捉长距离依赖关系。但是，LSTM在处理长序列时也会遇到困难。注意力机制（Attention）可以帮助模型聚焦于输入序列的重要部分，从而提高时间序列预测的性能。

应用使用场景

金融领域: 股票价格预测
气象预测: 天气预报
能源管理: 用电量预测
交通管理: 交通流量预测

原理解释

LSTM通过引入遗忘门、输入门和输出门来控制信息的流动，从而缓解了传统RNN的梯度消失问题。注意力机制通过计算加权和，使模型能够自动选择对当前任务最有帮助的输入位置。

算法原理流程图

算法原理解释

编码器: 输入序列通过LSTM单元进行编码，生成隐藏状态。
隐藏状态: 隐藏状态包含输入序列的信息。
注意力层: 根据隐藏状态计算注意力权重，得到上下文向量。
上下文向量: 上下文向量与隐藏状态结合形成解码器输入。
解码器: 解码器生成输出序列。

实际详细应用代码示例实现

数据准备

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader

class TimeSeriesDataset(Dataset):
    def __init__(self, data, seq_length):
        self.data = data
        self.seq_length = seq_length

    def __len__(self):
        return len(self.data) - self.seq_length

    def __getitem__(self, index):
        x = self.data[index:index+self.seq_length]
        y = self.data[index+self.seq_length]
        return torch.tensor(x, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)

模型定义

import torch.nn as nn

class Attention(nn.Module):
    def __init__(self, hidden_dim):
        super(Attention, self).__init__()
        self.attention = nn.Linear(hidden_dim, 1)

    def forward(self, hidden_states):
        scores = self.attention(hidden_states).squeeze(-1)
        weights = torch.softmax(scores, dim=1)
        context_vector = torch.sum(weights.unsqueeze(-1) * hidden_states, dim=1)
        return context_vector

class LSTMWithAttention(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
        super(LSTMWithAttention, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)
        self.attention = Attention(hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        context_vector = self.attention(lstm_out)
        output = self.fc(context_vector)
        return output

模型训练与测试

def train_model(model, train_loader, criterion, optimizer, epochs):
    model.train()
    for epoch in range(epochs):
        for x_batch, y_batch in train_loader:
            optimizer.zero_grad()
            outputs = model(x_batch.unsqueeze(-1))
            loss = criterion(outputs, y_batch.unsqueeze(-1))
            loss.backward()
            optimizer.step()
        print(f"Epoch {
      
      epoch+1}, Loss: {
      
      loss.item()}")

def evaluate_model(model, test_loader):
    model.eval()
    predictions, actuals = [], []
    with torch.no_grad():
        for x_batch, y_batch in test_loader:
            outputs = model(x_batch.unsqueeze(-1))
            predictions.extend(outputs.squeeze().tolist())
            actuals.extend(y_batch.squeeze().tolist())
    return predictions, actuals

主函数

if __name__ == "__main__":
    # Hyperparameters
    seq_length = 10
    input_dim = 1
    hidden_dim = 64
    output_dim = 1
    num_layers = 2
    learning_rate = 0.001
    epochs = 20

    # Dummy dataset
    data = np.sin(np.linspace(0, 50, 500))
    dataset = TimeSeriesDataset(data, seq_length)
    train_size = int(len(dataset) * 0.8)
    test_size = len(dataset) - train_size
    train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=32)

    # Model, Criterion and Optimizer
    model = LSTMWithAttention(input_dim, hidden_dim, output_dim, num_layers)
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Train and Evaluate
    train_model(model, train_loader, criterion, optimizer, epochs)
    predictions, actuals = evaluate_model(model, test_loader)
    print(predictions, actuals)