用于无法学习的时间序列预测的 LSTM (PyTorch)

LSTM for time-series prediction failing to learn (PyTorch)

我目前正在构建一个 LSTM 网络,以使用 PyTorch 预测时间序列数据。我尝试分享所有我认为有用的代码片段,但如果我能提供更多信息,请随时告诉我。我在 post 末尾添加了一些关于潜在问题的评论。

根据按日期索引的单变量时间序列数据,我创建了 3 个日期特征并将数据拆分为训练集和验证集,如下所示。

# X_train
             weekday    monthday    hour
timestamp           
2015-01-08 17:00:00 3   8   17
2015-01-12 19:30:00 0   12  19
2014-12-01 15:30:00 0   1   15
2014-07-26 09:00:00 5   26  9
2014-10-17 20:30:00 4   17  20
... ... ... ...
2014-08-29 06:30:00 4   29  6
2014-10-13 14:30:00 0   13  14
2015-01-03 02:00:00 5   3   2
2014-12-06 16:00:00 5   6   16
2015-01-06 20:30:00 1   6   20
8256 rows × 3 columns

# y_train
                    value
timestamp   
2015-01-08 17:00:00 17871
2015-01-12 19:30:00 20321
2014-12-01 15:30:00 16870
2014-07-26 09:00:00 11209
2014-10-17 20:30:00 26144
... ...
2014-08-29 06:30:00 9008
2014-10-13 14:30:00 17698
2015-01-03 02:00:00 12850
2014-12-06 16:00:00 18277
2015-01-06 20:30:00 19640
8256 rows × 1 columns

# X_val
             weekday    monthday    hour
timestamp           
2015-01-08 07:00:00 3   8   7
2014-10-13 22:00:00 0   13  22
2014-12-07 01:30:00 6   7   1
2014-10-14 17:30:00 1   14  17
2014-10-25 09:30:00 5   25  9
... ... ... ...
2014-09-26 12:30:00 4   26  12
2014-10-08 16:00:00 2   8   16
2014-12-03 01:30:00 2   3   1
2014-09-11 08:00:00 3   11  8
2015-01-15 10:00:00 3   15  10
2064 rows × 3 columns

# y_val
                    value
timestamp   
2014-09-13 13:00:00 21345
2014-10-28 20:30:00 23210
2015-01-21 17:00:00 17001
2014-07-20 10:30:00 13936
2015-01-29 02:00:00 3604
... ...
2014-11-17 11:00:00 15247
2015-01-14 00:00:00 10584
2014-09-02 13:00:00 17698
2014-08-31 13:00:00 16652
2014-08-30 12:30:00 15775
2064 rows × 1 columns

然后,我使用 sklearn 库中的 MinMaxScaler 转换了数据集中的值。

scaler = MinMaxScaler()
X_train_arr = scaler.fit_transform(X_train)
X_val_arr = scaler.transform(X_val)
y_train_arr = scaler.fit_transform(y_train)
y_val_arr = scaler.transform(y_val)

将这些 NumPy 数组转换为 PyTorch 张量后,我使用 PyTorch 提供的 TensorDataset 和 DataLoader classes 创建了可迭代数据集。

from torch.utils.data import TensorDataset, DataLoader
from torch.autograd import Variable

train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)

val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)

train = TensorDataset(train_features, train_targets)
train_loader = DataLoader(train, batch_size=64, shuffle=False)

val = TensorDataset(val_features, val_targets)
val_loader = DataLoader(train, batch_size=64, shuffle=False)

然后,我定义了我的 LSTM 模型和 train_step 函数如下:

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        
        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        
        # Index hidden state of last time step
        out = self.fc(out[:, -1, :]) 
        return out
def make_train_step(model, loss_fn, optimizer):
    # Builds function that performs a step in the train loop
    def train_step(x, y):
        # Sets model to TRAIN mode
        model.train()
        # Makes predictions
        yhat = model(x)
        # Computes loss
        loss = loss_fn(y, yhat)
        # Computes gradients
        loss.backward()
        # Updates parameters and zeroes gradients
        optimizer.step()
        optimizer.zero_grad()
        # Returns the loss
        return loss.item()
    
    # Returns the function that will be called inside the train loop
    return train_step

最后,我开始使用 AdamOptimizer 以小批量训练我的 LSTM 模型 20 个时期,这已经足够长,可以看出模型没有学习。

import torch.optim as optim

input_dim = n_features
hidden_dim = 64
layer_dim = 3
output_dim = 1

model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)

criterion = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(model.parameters(), lr=1e-2)

train_losses = []
val_losses = []
train_step = make_train_step(model, criterion, optimizer)
n_epochs = 20
device = 'cuda' if torch.cuda.is_available() else 'cpu'

for epoch in range(n_epochs):
    batch_losses = []
    for x_batch, y_batch in train_loader:
        x_batch = x_batch.unsqueeze(dim=0).to(device)
        y_batch = y_batch.to(device)
        loss = train_step(x_batch, y_batch)
        batch_losses.append(loss)
    training_loss = np.mean(batch_losses)
    train_losses.append(training_loss)    
    with torch.no_grad():
        batch_val_losses = []
        for x_val, y_val in val_loader:
            x_val = x_val.unsqueeze(dim=0).to(device)
            y_val = y_val.to(device)        
            model.eval()
            yhat = model(x_val)
            val_loss = criterion(y_val, yhat).item()
            batch_val_losses.append(val_loss)
        validation_loss = np.mean(batch_val_losses)
        val_losses.append(validation_loss)
    
    print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")

这是输出:

C:\Users\VS32XI\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:446: UserWarning: Using a target size (torch.Size([1, 1])) that is different to the input size (torch.Size([64, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
[1] Training loss: 0.0505    Validation loss: 0.0315
[2] Training loss: 0.0317    Validation loss: 0.0315
[3] Training loss: 0.0317    Validation loss: 0.0315
[4] Training loss: 0.0317    Validation loss: 0.0315
[5] Training loss: 0.0317    Validation loss: 0.0315
[6] Training loss: 0.0317    Validation loss: 0.0315
[7] Training loss: 0.0317    Validation loss: 0.0315
[8] Training loss: 0.0317    Validation loss: 0.0315
[9] Training loss: 0.0317    Validation loss: 0.0315
[10] Training loss: 0.0317   Validation loss: 0.0315
[11] Training loss: 0.0317   Validation loss: 0.0315
[12] Training loss: 0.0317   Validation loss: 0.0315
[13] Training loss: 0.0317   Validation loss: 0.0315
[14] Training loss: 0.0317   Validation loss: 0.0315
[15] Training loss: 0.0317   Validation loss: 0.0315
[16] Training loss: 0.0317   Validation loss: 0.0315
[17] Training loss: 0.0317   Validation loss: 0.0315
[18] Training loss: 0.0317   Validation loss: 0.0315
[19] Training loss: 0.0317   Validation loss: 0.0315
[20] Training loss: 0.0317   Validation loss: 0.0315

注意 1: 查看给出的警告,我不确定这是否是模型学习不好的真正原因。毕竟,我正在尝试预测时间序列数据中的未来值;因此,1 将是一个合理的输出维度。

注 2: 为了以小批量训练模型,我依赖于 class DataLoader。在训练和验证数据加载器中迭代 X 和 Y 批次时,x_batches 的维度为 2,而模型预期为 3。因此,我使用 PyTorch 的 unsqueeze 函数来匹配预期维度,如 x_batch.unsqueeze(dim=0) .我不确定我是否应该这样做,这也可能是问题所在。

一旦我使用 Tensor View 重塑训练和验证集中特征的小批量,问题就解决了。作为旁注,view() 通过避免显式数据复制,实现快速且内存高效的重塑、切片和逐元素操作。

事实证明,在早期的实现中,torch.unsqueeze() 没有将批次重塑为具有维度(批次大小、时间步长、特征数量)的张量。相反,函数 unsqueeze(dim=0) returns 一个新的张量,在第 Oth 个索引处插入了一个单独的维度。

因此,特征集的小批量形状如下 x_batch = x_batch.view([batch_size, -1, n_features]).to(device)

那么,新的训练循环就变成了:

for epoch in range(n_epochs):
    batch_losses = []
    for x_batch, y_batch in train_loader:
        x_batch = x_batch.view([batch_size, -1, n_features]).to(device) # <---
        y_batch = y_batch.to(device)
        loss = train_step(x_batch, y_batch)
        batch_losses.append(loss)
    training_loss = np.mean(batch_losses)
    train_losses.append(training_loss)    
    with torch.no_grad():
        batch_val_losses = []
        for x_val, y_val in val_loader:
            x_val = x_val.view([batch_size, -1, n_features]).to(device) # <---
            y_val = y_val.to(device)        
            model.eval()
            yhat = model(x_val)
            val_loss = criterion(y_val, yhat).item()
            batch_val_losses.append(val_loss)
        validation_loss = np.mean(batch_val_losses)
        val_losses.append(validation_loss)
    
    print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")

这是输出:

[1] Training loss: 0.0235    Validation loss: 0.0173
[2] Training loss: 0.0149    Validation loss: 0.0086
[3] Training loss: 0.0083    Validation loss: 0.0074
[4] Training loss: 0.0079    Validation loss: 0.0069
[5] Training loss: 0.0076    Validation loss: 0.0069

                          ...

[96] Training loss: 0.0025   Validation loss: 0.0028
[97] Training loss: 0.0024   Validation loss: 0.0027
[98] Training loss: 0.0027   Validation loss: 0.0033
[99] Training loss: 0.0027   Validation loss: 0.0030
[100] Training loss: 0.0023  Validation loss: 0.0028

最近,我决定把我学到的东西和我早先想知道的东西放在一起。如果您想看一看,可以在下面找到链接。我希望你会发现它有用。如果您同意或不同意我上面的任何评论,请随时发表评论或与我联系。