用于无法学习的时间序列预测的 LSTM (PyTorch)
LSTM for time-series prediction failing to learn (PyTorch)
我目前正在构建一个 LSTM 网络,以使用 PyTorch 预测时间序列数据。我尝试分享所有我认为有用的代码片段,但如果我能提供更多信息,请随时告诉我。我在 post 末尾添加了一些关于潜在问题的评论。
根据按日期索引的单变量时间序列数据,我创建了 3 个日期特征并将数据拆分为训练集和验证集,如下所示。
# X_train
weekday monthday hour
timestamp
2015-01-08 17:00:00 3 8 17
2015-01-12 19:30:00 0 12 19
2014-12-01 15:30:00 0 1 15
2014-07-26 09:00:00 5 26 9
2014-10-17 20:30:00 4 17 20
... ... ... ...
2014-08-29 06:30:00 4 29 6
2014-10-13 14:30:00 0 13 14
2015-01-03 02:00:00 5 3 2
2014-12-06 16:00:00 5 6 16
2015-01-06 20:30:00 1 6 20
8256 rows × 3 columns
# y_train
value
timestamp
2015-01-08 17:00:00 17871
2015-01-12 19:30:00 20321
2014-12-01 15:30:00 16870
2014-07-26 09:00:00 11209
2014-10-17 20:30:00 26144
... ...
2014-08-29 06:30:00 9008
2014-10-13 14:30:00 17698
2015-01-03 02:00:00 12850
2014-12-06 16:00:00 18277
2015-01-06 20:30:00 19640
8256 rows × 1 columns
# X_val
weekday monthday hour
timestamp
2015-01-08 07:00:00 3 8 7
2014-10-13 22:00:00 0 13 22
2014-12-07 01:30:00 6 7 1
2014-10-14 17:30:00 1 14 17
2014-10-25 09:30:00 5 25 9
... ... ... ...
2014-09-26 12:30:00 4 26 12
2014-10-08 16:00:00 2 8 16
2014-12-03 01:30:00 2 3 1
2014-09-11 08:00:00 3 11 8
2015-01-15 10:00:00 3 15 10
2064 rows × 3 columns
# y_val
value
timestamp
2014-09-13 13:00:00 21345
2014-10-28 20:30:00 23210
2015-01-21 17:00:00 17001
2014-07-20 10:30:00 13936
2015-01-29 02:00:00 3604
... ...
2014-11-17 11:00:00 15247
2015-01-14 00:00:00 10584
2014-09-02 13:00:00 17698
2014-08-31 13:00:00 16652
2014-08-30 12:30:00 15775
2064 rows × 1 columns
然后,我使用 sklearn 库中的 MinMaxScaler 转换了数据集中的值。
scaler = MinMaxScaler()
X_train_arr = scaler.fit_transform(X_train)
X_val_arr = scaler.transform(X_val)
y_train_arr = scaler.fit_transform(y_train)
y_val_arr = scaler.transform(y_val)
将这些 NumPy 数组转换为 PyTorch 张量后,我使用 PyTorch 提供的 TensorDataset 和 DataLoader classes 创建了可迭代数据集。
from torch.utils.data import TensorDataset, DataLoader
from torch.autograd import Variable
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
train = TensorDataset(train_features, train_targets)
train_loader = DataLoader(train, batch_size=64, shuffle=False)
val = TensorDataset(val_features, val_targets)
val_loader = DataLoader(train, batch_size=64, shuffle=False)
然后,我定义了我的 LSTM 模型和 train_step 函数如下:
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
super(LSTMModel, self).__init__()
# Hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
# Building your LSTM
# batch_first=True causes input/output tensors to be of shape
# (batch_dim, seq_dim, feature_dim)
self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
# Initialize cell state
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
# We need to detach as we are doing truncated backpropagation through time (BPTT)
# If we don't, we'll backprop all the way to the start even after going through another batch
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
# Index hidden state of last time step
out = self.fc(out[:, -1, :])
return out
def make_train_step(model, loss_fn, optimizer):
# Builds function that performs a step in the train loop
def train_step(x, y):
# Sets model to TRAIN mode
model.train()
# Makes predictions
yhat = model(x)
# Computes loss
loss = loss_fn(y, yhat)
# Computes gradients
loss.backward()
# Updates parameters and zeroes gradients
optimizer.step()
optimizer.zero_grad()
# Returns the loss
return loss.item()
# Returns the function that will be called inside the train loop
return train_step
最后,我开始使用 AdamOptimizer 以小批量训练我的 LSTM 模型 20 个时期,这已经足够长,可以看出模型没有学习。
import torch.optim as optim
input_dim = n_features
hidden_dim = 64
layer_dim = 3
output_dim = 1
model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
criterion = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(model.parameters(), lr=1e-2)
train_losses = []
val_losses = []
train_step = make_train_step(model, criterion, optimizer)
n_epochs = 20
device = 'cuda' if torch.cuda.is_available() else 'cpu'
for epoch in range(n_epochs):
batch_losses = []
for x_batch, y_batch in train_loader:
x_batch = x_batch.unsqueeze(dim=0).to(device)
y_batch = y_batch.to(device)
loss = train_step(x_batch, y_batch)
batch_losses.append(loss)
training_loss = np.mean(batch_losses)
train_losses.append(training_loss)
with torch.no_grad():
batch_val_losses = []
for x_val, y_val in val_loader:
x_val = x_val.unsqueeze(dim=0).to(device)
y_val = y_val.to(device)
model.eval()
yhat = model(x_val)
val_loss = criterion(y_val, yhat).item()
batch_val_losses.append(val_loss)
validation_loss = np.mean(batch_val_losses)
val_losses.append(validation_loss)
print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")
这是输出:
C:\Users\VS32XI\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:446: UserWarning: Using a target size (torch.Size([1, 1])) that is different to the input size (torch.Size([64, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
[1] Training loss: 0.0505 Validation loss: 0.0315
[2] Training loss: 0.0317 Validation loss: 0.0315
[3] Training loss: 0.0317 Validation loss: 0.0315
[4] Training loss: 0.0317 Validation loss: 0.0315
[5] Training loss: 0.0317 Validation loss: 0.0315
[6] Training loss: 0.0317 Validation loss: 0.0315
[7] Training loss: 0.0317 Validation loss: 0.0315
[8] Training loss: 0.0317 Validation loss: 0.0315
[9] Training loss: 0.0317 Validation loss: 0.0315
[10] Training loss: 0.0317 Validation loss: 0.0315
[11] Training loss: 0.0317 Validation loss: 0.0315
[12] Training loss: 0.0317 Validation loss: 0.0315
[13] Training loss: 0.0317 Validation loss: 0.0315
[14] Training loss: 0.0317 Validation loss: 0.0315
[15] Training loss: 0.0317 Validation loss: 0.0315
[16] Training loss: 0.0317 Validation loss: 0.0315
[17] Training loss: 0.0317 Validation loss: 0.0315
[18] Training loss: 0.0317 Validation loss: 0.0315
[19] Training loss: 0.0317 Validation loss: 0.0315
[20] Training loss: 0.0317 Validation loss: 0.0315
注意 1: 查看给出的警告,我不确定这是否是模型学习不好的真正原因。毕竟,我正在尝试预测时间序列数据中的未来值;因此,1 将是一个合理的输出维度。
注 2: 为了以小批量训练模型,我依赖于 class DataLoader。在训练和验证数据加载器中迭代 X 和 Y 批次时,x_batches 的维度为 2,而模型预期为 3。因此,我使用 PyTorch 的 unsqueeze 函数来匹配预期维度,如 x_batch.unsqueeze(dim=0)
.我不确定我是否应该这样做,这也可能是问题所在。
一旦我使用 Tensor View 重塑训练和验证集中特征的小批量,问题就解决了。作为旁注,view()
通过避免显式数据复制,实现快速且内存高效的重塑、切片和逐元素操作。
事实证明,在早期的实现中,torch.unsqueeze()
没有将批次重塑为具有维度(批次大小、时间步长、特征数量)的张量。相反,函数 unsqueeze(dim=0)
returns 一个新的张量,在第 Oth 个索引处插入了一个单独的维度。
因此,特征集的小批量形状如下 x_batch = x_batch.view([batch_size, -1, n_features]).to(device)
那么,新的训练循环就变成了:
for epoch in range(n_epochs):
batch_losses = []
for x_batch, y_batch in train_loader:
x_batch = x_batch.view([batch_size, -1, n_features]).to(device) # <---
y_batch = y_batch.to(device)
loss = train_step(x_batch, y_batch)
batch_losses.append(loss)
training_loss = np.mean(batch_losses)
train_losses.append(training_loss)
with torch.no_grad():
batch_val_losses = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device) # <---
y_val = y_val.to(device)
model.eval()
yhat = model(x_val)
val_loss = criterion(y_val, yhat).item()
batch_val_losses.append(val_loss)
validation_loss = np.mean(batch_val_losses)
val_losses.append(validation_loss)
print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")
这是输出:
[1] Training loss: 0.0235 Validation loss: 0.0173
[2] Training loss: 0.0149 Validation loss: 0.0086
[3] Training loss: 0.0083 Validation loss: 0.0074
[4] Training loss: 0.0079 Validation loss: 0.0069
[5] Training loss: 0.0076 Validation loss: 0.0069
...
[96] Training loss: 0.0025 Validation loss: 0.0028
[97] Training loss: 0.0024 Validation loss: 0.0027
[98] Training loss: 0.0027 Validation loss: 0.0033
[99] Training loss: 0.0027 Validation loss: 0.0030
[100] Training loss: 0.0023 Validation loss: 0.0028
最近,我决定把我学到的东西和我早先想知道的东西放在一起。如果您想看一看,可以在下面找到链接。我希望你会发现它有用。如果您同意或不同意我上面的任何评论,请随时发表评论或与我联系。
我目前正在构建一个 LSTM 网络,以使用 PyTorch 预测时间序列数据。我尝试分享所有我认为有用的代码片段,但如果我能提供更多信息,请随时告诉我。我在 post 末尾添加了一些关于潜在问题的评论。
根据按日期索引的单变量时间序列数据,我创建了 3 个日期特征并将数据拆分为训练集和验证集,如下所示。
# X_train
weekday monthday hour
timestamp
2015-01-08 17:00:00 3 8 17
2015-01-12 19:30:00 0 12 19
2014-12-01 15:30:00 0 1 15
2014-07-26 09:00:00 5 26 9
2014-10-17 20:30:00 4 17 20
... ... ... ...
2014-08-29 06:30:00 4 29 6
2014-10-13 14:30:00 0 13 14
2015-01-03 02:00:00 5 3 2
2014-12-06 16:00:00 5 6 16
2015-01-06 20:30:00 1 6 20
8256 rows × 3 columns
# y_train
value
timestamp
2015-01-08 17:00:00 17871
2015-01-12 19:30:00 20321
2014-12-01 15:30:00 16870
2014-07-26 09:00:00 11209
2014-10-17 20:30:00 26144
... ...
2014-08-29 06:30:00 9008
2014-10-13 14:30:00 17698
2015-01-03 02:00:00 12850
2014-12-06 16:00:00 18277
2015-01-06 20:30:00 19640
8256 rows × 1 columns
# X_val
weekday monthday hour
timestamp
2015-01-08 07:00:00 3 8 7
2014-10-13 22:00:00 0 13 22
2014-12-07 01:30:00 6 7 1
2014-10-14 17:30:00 1 14 17
2014-10-25 09:30:00 5 25 9
... ... ... ...
2014-09-26 12:30:00 4 26 12
2014-10-08 16:00:00 2 8 16
2014-12-03 01:30:00 2 3 1
2014-09-11 08:00:00 3 11 8
2015-01-15 10:00:00 3 15 10
2064 rows × 3 columns
# y_val
value
timestamp
2014-09-13 13:00:00 21345
2014-10-28 20:30:00 23210
2015-01-21 17:00:00 17001
2014-07-20 10:30:00 13936
2015-01-29 02:00:00 3604
... ...
2014-11-17 11:00:00 15247
2015-01-14 00:00:00 10584
2014-09-02 13:00:00 17698
2014-08-31 13:00:00 16652
2014-08-30 12:30:00 15775
2064 rows × 1 columns
然后,我使用 sklearn 库中的 MinMaxScaler 转换了数据集中的值。
scaler = MinMaxScaler()
X_train_arr = scaler.fit_transform(X_train)
X_val_arr = scaler.transform(X_val)
y_train_arr = scaler.fit_transform(y_train)
y_val_arr = scaler.transform(y_val)
将这些 NumPy 数组转换为 PyTorch 张量后,我使用 PyTorch 提供的 TensorDataset 和 DataLoader classes 创建了可迭代数据集。
from torch.utils.data import TensorDataset, DataLoader
from torch.autograd import Variable
train_features = torch.Tensor(X_train_arr)
train_targets = torch.Tensor(y_train_arr)
val_features = torch.Tensor(X_val_arr)
val_targets = torch.Tensor(y_val_arr)
train = TensorDataset(train_features, train_targets)
train_loader = DataLoader(train, batch_size=64, shuffle=False)
val = TensorDataset(val_features, val_targets)
val_loader = DataLoader(train, batch_size=64, shuffle=False)
然后,我定义了我的 LSTM 模型和 train_step 函数如下:
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
super(LSTMModel, self).__init__()
# Hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
# Building your LSTM
# batch_first=True causes input/output tensors to be of shape
# (batch_dim, seq_dim, feature_dim)
self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
# Initialize cell state
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
# We need to detach as we are doing truncated backpropagation through time (BPTT)
# If we don't, we'll backprop all the way to the start even after going through another batch
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
# Index hidden state of last time step
out = self.fc(out[:, -1, :])
return out
def make_train_step(model, loss_fn, optimizer):
# Builds function that performs a step in the train loop
def train_step(x, y):
# Sets model to TRAIN mode
model.train()
# Makes predictions
yhat = model(x)
# Computes loss
loss = loss_fn(y, yhat)
# Computes gradients
loss.backward()
# Updates parameters and zeroes gradients
optimizer.step()
optimizer.zero_grad()
# Returns the loss
return loss.item()
# Returns the function that will be called inside the train loop
return train_step
最后,我开始使用 AdamOptimizer 以小批量训练我的 LSTM 模型 20 个时期,这已经足够长,可以看出模型没有学习。
import torch.optim as optim
input_dim = n_features
hidden_dim = 64
layer_dim = 3
output_dim = 1
model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
criterion = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(model.parameters(), lr=1e-2)
train_losses = []
val_losses = []
train_step = make_train_step(model, criterion, optimizer)
n_epochs = 20
device = 'cuda' if torch.cuda.is_available() else 'cpu'
for epoch in range(n_epochs):
batch_losses = []
for x_batch, y_batch in train_loader:
x_batch = x_batch.unsqueeze(dim=0).to(device)
y_batch = y_batch.to(device)
loss = train_step(x_batch, y_batch)
batch_losses.append(loss)
training_loss = np.mean(batch_losses)
train_losses.append(training_loss)
with torch.no_grad():
batch_val_losses = []
for x_val, y_val in val_loader:
x_val = x_val.unsqueeze(dim=0).to(device)
y_val = y_val.to(device)
model.eval()
yhat = model(x_val)
val_loss = criterion(y_val, yhat).item()
batch_val_losses.append(val_loss)
validation_loss = np.mean(batch_val_losses)
val_losses.append(validation_loss)
print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")
这是输出:
C:\Users\VS32XI\Anaconda3\lib\site-packages\torch\nn\modules\loss.py:446: UserWarning: Using a target size (torch.Size([1, 1])) that is different to the input size (torch.Size([64, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
[1] Training loss: 0.0505 Validation loss: 0.0315
[2] Training loss: 0.0317 Validation loss: 0.0315
[3] Training loss: 0.0317 Validation loss: 0.0315
[4] Training loss: 0.0317 Validation loss: 0.0315
[5] Training loss: 0.0317 Validation loss: 0.0315
[6] Training loss: 0.0317 Validation loss: 0.0315
[7] Training loss: 0.0317 Validation loss: 0.0315
[8] Training loss: 0.0317 Validation loss: 0.0315
[9] Training loss: 0.0317 Validation loss: 0.0315
[10] Training loss: 0.0317 Validation loss: 0.0315
[11] Training loss: 0.0317 Validation loss: 0.0315
[12] Training loss: 0.0317 Validation loss: 0.0315
[13] Training loss: 0.0317 Validation loss: 0.0315
[14] Training loss: 0.0317 Validation loss: 0.0315
[15] Training loss: 0.0317 Validation loss: 0.0315
[16] Training loss: 0.0317 Validation loss: 0.0315
[17] Training loss: 0.0317 Validation loss: 0.0315
[18] Training loss: 0.0317 Validation loss: 0.0315
[19] Training loss: 0.0317 Validation loss: 0.0315
[20] Training loss: 0.0317 Validation loss: 0.0315
注意 1: 查看给出的警告,我不确定这是否是模型学习不好的真正原因。毕竟,我正在尝试预测时间序列数据中的未来值;因此,1 将是一个合理的输出维度。
注 2: 为了以小批量训练模型,我依赖于 class DataLoader。在训练和验证数据加载器中迭代 X 和 Y 批次时,x_batches 的维度为 2,而模型预期为 3。因此,我使用 PyTorch 的 unsqueeze 函数来匹配预期维度,如 x_batch.unsqueeze(dim=0)
.我不确定我是否应该这样做,这也可能是问题所在。
一旦我使用 Tensor View 重塑训练和验证集中特征的小批量,问题就解决了。作为旁注,view()
通过避免显式数据复制,实现快速且内存高效的重塑、切片和逐元素操作。
事实证明,在早期的实现中,torch.unsqueeze()
没有将批次重塑为具有维度(批次大小、时间步长、特征数量)的张量。相反,函数 unsqueeze(dim=0)
returns 一个新的张量,在第 Oth 个索引处插入了一个单独的维度。
因此,特征集的小批量形状如下 x_batch = x_batch.view([batch_size, -1, n_features]).to(device)
那么,新的训练循环就变成了:
for epoch in range(n_epochs):
batch_losses = []
for x_batch, y_batch in train_loader:
x_batch = x_batch.view([batch_size, -1, n_features]).to(device) # <---
y_batch = y_batch.to(device)
loss = train_step(x_batch, y_batch)
batch_losses.append(loss)
training_loss = np.mean(batch_losses)
train_losses.append(training_loss)
with torch.no_grad():
batch_val_losses = []
for x_val, y_val in val_loader:
x_val = x_val.view([batch_size, -1, n_features]).to(device) # <---
y_val = y_val.to(device)
model.eval()
yhat = model(x_val)
val_loss = criterion(y_val, yhat).item()
batch_val_losses.append(val_loss)
validation_loss = np.mean(batch_val_losses)
val_losses.append(validation_loss)
print(f"[{epoch+1}] Training loss: {training_loss:.4f}\t Validation loss: {validation_loss:.4f}")
这是输出:
[1] Training loss: 0.0235 Validation loss: 0.0173
[2] Training loss: 0.0149 Validation loss: 0.0086
[3] Training loss: 0.0083 Validation loss: 0.0074
[4] Training loss: 0.0079 Validation loss: 0.0069
[5] Training loss: 0.0076 Validation loss: 0.0069
...
[96] Training loss: 0.0025 Validation loss: 0.0028
[97] Training loss: 0.0024 Validation loss: 0.0027
[98] Training loss: 0.0027 Validation loss: 0.0033
[99] Training loss: 0.0027 Validation loss: 0.0030
[100] Training loss: 0.0023 Validation loss: 0.0028
最近,我决定把我学到的东西和我早先想知道的东西放在一起。如果您想看一看,可以在下面找到链接。我希望你会发现它有用。如果您同意或不同意我上面的任何评论,请随时发表评论或与我联系。