如何用 Keras 解释 RNN 的输出？

Question

我想使用 RNN 进行时间序列预测，以使用 96 步向后预测未来的 96 步。为此，我有以下代码：

#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras

# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
helpValueStrides =  int(steps_backwards /steps_forward)

#Read dataset
df = pd.read_csv('C:/Users1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])

# standardize data

data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)


scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)


# Prepare the input data for the RNN

series_reshaped_X =  np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y =  np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])


timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)

X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards] 
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards] 
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards] 

   
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:] 
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:] 
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]                                
   
   
# Build the model and train it

np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)

model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides), 
keras.layers.TimeDistributed(keras.layers.Dense(1))
])

model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))

#Predict the test data
Y_pred = model.predict(X_test)

prediction_lastValues_list=[]

for i in range (0, len(Y_pred)):
  prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))

# Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,1]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {2:'Feature 2'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100


# Inverse the scaling (traInv: transformation inversed)

data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed =  np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards] 
predictions_traInv = scaler_standardized_Y.inverse_transform(wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1))

edictions_traInv = wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1)

# Create thw dataframe for the inversed transformed data
wholeDataFrameWithPrediciton_traInv = pd.DataFrame((X_test_notTranformed[:,0]))
wholeDataFrameWithPrediciton_traInv.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton_traInv.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton_traInv['predictions'] = predictions_traInv
wholeDataFrameWithPrediciton_traInv['difference_absolute'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual']).abs()
wholeDataFrameWithPrediciton_traInv['difference_percentage'] = ((wholeDataFrameWithPrediciton_traInv['difference_absolute'])/(wholeDataFrameWithPrediciton_traInv['actual']))*100
wholeDataFrameWithPrediciton_traInv['difference'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual'])

这里你可以得到一些测试数据（不要在意我编的实际值，形状很重要）Download test data

如何解释 Y_pred 数据的输出？这些值中的哪一个为我提供了未来 96 步的预测值？我附上了 'Y_pred' 数据的屏幕截图。一次在最后一层有 5 个输出神经元，一次只有 1 个。谁能告诉我，如何解释 'Y_pred' 数据意味着 RNN 预测的到底是什么？我可以在 RNN 模型的输出（最后一层）中使用任何值。 'Y_pred' 数据始终具有形状（批量大小 X_test，时间序列，输出神经元数量）。我的问题针对的是最后一个维度。我认为这些可能是特征，但在我的情况下并非如此，因为我只有 1 个输出特征（你可以看到 Y_train、Y_test 和 [=14 的形状=]数据）。

**温馨提示**：赏金即将到期，遗憾的是我还没有收到任何答复。所以我想提醒你关于问题和赏金的问题。我会非常感谢每一条评论。

Answer 1

详细了解模型 inputs/outputs 可能会有用。

将 keras.layers.SimpleRNN 层与 return_sequences=True 一起使用时，输出将 return 一个 3-D 张量，其中第 0 轴是批量大小，第 1 轴是时间步长，第二个轴是隐藏单元的数量（对于模型中的两个 SimpleRNN 层，都是 10）。

Conv1D 层将产生一个输出张量，其中最后一个维度成为隐藏单元的数量（在您的模型中为 16），因为它只是与输入进行卷积。

keras.layers.TimeDistributed，提供的图层（在提供的示例中，Dense(1)）将独立应用于批处理中的每个时间步长。因此，对于 96 个时间步，批处理中的每条记录都有 96 个输出。

所以单步执行您的模型：

model = keras.models.Sequential([
    keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
    keras.layers.SimpleRNN(10, return_sequences=True), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
    keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 16),
    keras.layers.TimeDistributed(keras.layers.Dense(1)) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
])

为了回答您的问题，模型的输出张量包含每个样本未来 96 步的预测值。如果更容易概念化，对于 1 输出的情况，您可以将 np.squeeze 应用于 model.predict 的结果，这将使输出 2-D:

Y_pred = model.predict(X_test) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
Y_pred_squeezed = np.squeeze(Y_pred) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS)

这样，您就有了一个矩形矩阵，其中每一行对应批次中的一个样本，每一列 i 对应时间步长的预测 i.

在预测步骤之后的循环中，除了第一个预测之外的所有时间步预测都被丢弃：

for i in range(0, len(Y_pred)):
    prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))

这意味着最终结果只是批次中每个样本的第一个时间步长的预测列表。如果你想要第 96 个时间步的预测，你可以这样做：

for i in range(0, len(Y_pred)):
    prediction_lastValues_list.append((Y_pred[i][-1][1 - 1]))

注意第二个括号的 -1 而不是 0，以确保我们获取最后预测的时间步而不是第一个。

附带说明一下，要复制结果，我必须对您的代码进行一次更改，特别是在创建 series_reshaped_X 和 series_reshaped_Y 时。我在使用 np.array 从列表创建数组时遇到异常：ValueError: cannot copy sequence with size 192 to array axis with dimension 3 ，但看看你在做什么（沿新轴连接张量），我将其更改为 np.stack , 这将实现相同的目标 (https://numpy.org/doc/stable/reference/generated/numpy.stack.html):

series_reshaped_X = np.stack([data_X[i:i + (steps_backwards + steps_forward)].copy() for i in
                              range(len(data) - (steps_backwards + steps_forward))])
series_reshaped_Y = np.stack([data_Y[i:i + (steps_backwards + steps_forward)].copy() for i in
                              range(len(data) - (steps_backwards + steps_forward))])

更新

“当我只有 1 个目标特征时，这 5 个值代表什么？”

这实际上只是 Tensorflow 的广播功能 API（这也是 NumPy 的一个功能）。如果对两个形状不同的张量执行算术运算，它会尝试使它们兼容。在这种情况下，如果将输出层大小更改为“5”而不是“1”(keras.layers.Dense(5))，则输出大小为(BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5)而不是(BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)，这就意味着卷积层的输出进入 5 个神经元而不是 1 个。当计算两者之间的损失（均方误差）时，标签张量的大小 ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)) 被广播到预测张量的大小((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5))。在这种情况下，广播是通过复制列来完成的。例如，如果Y_train在第一时间步的第一行有[-1.69862224]，而Y_pred在第一时间步的第一行有[-0.6132075 , -0.6621697 , -0.7712653 , -0.60011995, -0.48753992]，执行减法运算, Y_train 中的条目将转换为 [-1.69862224, -1.69862224, -1.69862224, -1.69862224, -1.69862224].

这 5 个值中的哪个是为 96 时间步提前预测选择的“正确”值？

以这种方式训练时没有真正的“正确”值 - 如上所述，这只是 API 的一个特征。所有输出都应该收敛到时间步长的单个目标值，它们都与该值进行比较，因此您可以在技术上以这种方式进行训练，但这只是为模型增加了参数和复杂性（您只需要选择一个成为“真正的”预测）。原始答案中详细说明了提前获得 96 个时间步预测的正确方法，但重申一下，模型的输出包含批次中每个样本的未来时间步预测。可以迭代输出张量以检索每个样本的每个时间步长的预测。此外，确保最终密集层中的神经元数量与您尝试预测的目标值数量相匹配，否则您将遇到广播问题（并且“正确”输出将不清楚）。

只是为了详尽（我不推荐这个），如果你真的想在输出中加入几个神经元，尽管只有一个目标值，你可以做一些事情比如平均结果：

for i in range(0, len(Y_pred)):
    prediction_lastValues_list.append(np.mean(Y_pred[i][0]))

但是这种方法绝对没有任何好处，所以我建议还是坚持之前的建议。

更新 2

我的模型是只预测一个时隙，即未来 96 个时间步，还是它也预测两者之间的所有时间？ 该模型正在预测两者之间的一切。因此对于时间步 t 的样本，模型的输出是预测 [t + 1, t + 2, ..., t + NUMBER_OF_TIMESTEPS]。根据我最初的回答，“模型的输出张量包含每个样本未来 96 步的预测值”。要在您的评估代码中指定，您可以执行以下操作：

Y_pred = np.squeeze(Y_pred)
predictions_for_all_samples_and_timesteps = Y_pred.tolist()

这会产生一个长度为 BATCH_SIZE 的列表，列表中的每个元素都是一个长度为 NUMBER_OF_TIMESTEPS 的列表（要清楚，predictions_for_all_samples_and_timesteps 是一个列表的列表） . predictions_for_all_samples_and_timesteps 中索引 i 处的元素包含第 i^th 样本（行）从 1-96 的每个时间步长的预测在 X_test.

附带说明一下，您可以省略 np.squeeze，但是您将得到一个列表列表的列表，其中内部列表中的每个元素都是一个项目的列表（而不是 [[1, 2, 3, ...], ], 输出看起来像 [[[1], [2], [3], ...], ].

更新 3

Y_test 和 Y_pred 都是大小为 (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1) 的 3-D numpy 数组。要比较它们，您可以取两者之间的绝对（或平方）差值：

abs_diff = np.abs(Y_pred - Y_test)

这会产生一个相同维度的数组，(BATCH_SIZE, NUMBER_OF_TIMESTEPS)。然后，您可以遍历行并为每一行生成时间步错误图。

for diff in abs_diff:
    print(diff.shape)
    plt.plot(list(range(diff)), diff)

对于大批量大小（如您在图像中看到的那样），它可能会有点笨拙，因此您可能绘制了行的子集。如果您希望绘制以下图表，您还可以将绝对差值转换为误差百分比：

percentage_diff = abs_diff / Y_test

这将是与实际值的绝对差异，正如我看到您最初在 Pandas 中所做的那样。这个 numpy 数组将具有相同的维度，因此您可以对其进行迭代并以相同的方式生成绘图。

对于未来的查询，请打开一个新问题并提供 link - 我很乐意继续提供帮助，但我想继续从中获得声誉，而不是发表评论。

Answer 2

我只在一点上不同意@danielcahall：

The output tensor from your model contains the predicted values for 96 steps into the future, for each sample

输出确实包含 96 个时间步长，每个时间步长对应一个输入时间步长，您可以将输出表示为任何您想要的意思。但这不是您要尝试做的事情的好模型。主要原因是您使用的 RNN 是单向的。

x   x   x   x   x   x    # input
|   |   |   |   |   | 
x-->x-->x-->x-->x-->x    # SimpleRNN
|   |   |   |   |   | 
x-->x-->x-->x-->x-->x    # SimpleRNN
|  /|\ /|\ /|\ /|\  | 
| / | \ | \ | \ | \ |
x   x   x   x   x   x    # Conv
|   |   |   |   |   | 
x   x   x   x   x   x    # Dense -> output

所以输出的第一个时间索引只能看到前2个输入时间（感谢Conv），它看不到后面的时间。第一个预测仅基于旧数据。只有最后几个输出才能看到所有的输入。

use 96 backwards steps to predict 96 steps into the future

大多数输出只是看不到所有数据。

如果您尝试从每个输入时间预测未来的 1 步，那么此模型将是合适的。

要预测未来的 96 步，删除 return_sequences=True 和 Conv 层会更合理。然后展开Dense层做预测：

model = keras.models.Sequential([
    keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
    keras.layers.SimpleRNN(10), # output size is (BATCH_SIZE, 10)
    keras.layers.Dense(96) # output size is (BATCH_SIZE, 96)
])

这样所有 96 个预测都会看到所有 96 个输入。

有关详细信息，请参阅 https://www.tensorflow.org/tutorials/structured_data/time_series。

另外SimpleRNN也很糟糕。永远不要超过几个步骤使用它。

如何用 Keras 解释 RNN 的输出？

How to interpret the output of a RNN with Keras?

python

time-series

keras

tensorflow

recurrent-neural-network

更新

更新 2

更新 3