Keras RNN 中的维数问题 - 重塑不起作用?
Problem with dimensionality in Keras RNN - reshape isn't working?
让我们考虑一下我要对其执行 RNN 的随机数据集:
import random
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
from keras.optimizers import SGD
import numpy as np
df_train = random.sample(range(1, 100), 50)
我想应用滞后等于 1 的 RNN。我将使用我自己的函数:
def create_dataset(dataset, lags):
dataX, dataY = [], []
for i in range(lags):
subdata = dataset[i:len(dataset) - lags + i]
dataX.append(subdata)
dataY.append(dataset[lags:len(dataset)])
return np.array(dataX), np.array(dataY)
根据滞后数缩小数据框。它输出两个 numpy 数组——第一个是自变量,第二个是因变量。
x_train, y_train = create_dataset(df_train, lags = 1)
但现在当我尝试 运行 函数时:
model = Sequential()
model.add(SimpleRNN(1, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=SGD(lr = 0.1))
history = model.fit(x_train, y_train, epochs=1000, batch_size=50, validation_split=0.2)
我得到错误:
ValueError: Error when checking input: expected simple_rnn_18_input to have 3 dimensions, but got array with shape (1, 49)
我读过它,解决方案只是应用重塑:
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
但是当我应用它时出现错误:
ValueError: Error when checking input: expected simple_rnn_19_input to have shape (1, 1) but got array with shape (1, 49)
我不确定哪里出错了。你能告诉我我做错了什么吗?
你所说的lags
在文献中叫做look back
。这种技术允许为 RNN 提供更多上下文数据并学习 mid/long 范围依赖性。
错误告诉您您正在为层(形状:1x1
)提供数据集(形状:1x49
)
错误背后有两个原因:
第一个是由于你的 create_dataset
正在构建一堆 1x(50 - lags) = 1x49
向量,这与你想要的相反 1x(lags) = 1x1
.
特别是这一行负责人:
subdata = dataset[i:len(dataset) - lags + i]
# with lags = 1 you have just one
# iteration in range(1): i = 0
subdata = dataset[0:50 - 1 + 0]
subdata = dataset[0:49] # which is a 1x49 vector
# In order to create the right vector
# you need to change your function:
def create_dataset(dataset, lags = 1):
dataX, dataY = [], []
# iterate to a max of (50 - lags - 1) times
# because we need "lags" element in each vector
for i in range(len(dataset) - lags - 1):
# get "lags" elements from the dataset
subdata = dataset[i:i + lags]
dataX.append(subdata)
# get only the last label representing
# the current element iteration
dataY.append(dataset[i + lags])
return np.array(dataX), np.array(dataY)
如果您在 RNN 中使用 look back
,您还需要增加输入维度,因为您还要查看先前的样本。
网络确实正在寻找比 1 个样本更多的数据,因为它需要“回顾”更多样本以了解 mid/long 范围依赖性。
这比实际更概念化,在您的代码中没有问题,因为 lags = 1
:
model.add(SimpleRNN(1, input_shape=(1, 1)))
# you should use lags in the input shape
model.add(SimpleRNN(1, input_shape=(1, LAGS)))
让我们考虑一下我要对其执行 RNN 的随机数据集:
import random
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
from keras.optimizers import SGD
import numpy as np
df_train = random.sample(range(1, 100), 50)
我想应用滞后等于 1 的 RNN。我将使用我自己的函数:
def create_dataset(dataset, lags):
dataX, dataY = [], []
for i in range(lags):
subdata = dataset[i:len(dataset) - lags + i]
dataX.append(subdata)
dataY.append(dataset[lags:len(dataset)])
return np.array(dataX), np.array(dataY)
根据滞后数缩小数据框。它输出两个 numpy 数组——第一个是自变量,第二个是因变量。
x_train, y_train = create_dataset(df_train, lags = 1)
但现在当我尝试 运行 函数时:
model = Sequential()
model.add(SimpleRNN(1, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=SGD(lr = 0.1))
history = model.fit(x_train, y_train, epochs=1000, batch_size=50, validation_split=0.2)
我得到错误:
ValueError: Error when checking input: expected simple_rnn_18_input to have 3 dimensions, but got array with shape (1, 49)
我读过它,解决方案只是应用重塑:
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
但是当我应用它时出现错误:
ValueError: Error when checking input: expected simple_rnn_19_input to have shape (1, 1) but got array with shape (1, 49)
我不确定哪里出错了。你能告诉我我做错了什么吗?
你所说的lags
在文献中叫做look back
。这种技术允许为 RNN 提供更多上下文数据并学习 mid/long 范围依赖性。
错误告诉您您正在为层(形状:1x1
)提供数据集(形状:1x49
)
错误背后有两个原因:
第一个是由于你的
create_dataset
正在构建一堆1x(50 - lags) = 1x49
向量,这与你想要的相反1x(lags) = 1x1
.特别是这一行负责人:
subdata = dataset[i:len(dataset) - lags + i]
# with lags = 1 you have just one
# iteration in range(1): i = 0
subdata = dataset[0:50 - 1 + 0]
subdata = dataset[0:49] # which is a 1x49 vector
# In order to create the right vector
# you need to change your function:
def create_dataset(dataset, lags = 1):
dataX, dataY = [], []
# iterate to a max of (50 - lags - 1) times
# because we need "lags" element in each vector
for i in range(len(dataset) - lags - 1):
# get "lags" elements from the dataset
subdata = dataset[i:i + lags]
dataX.append(subdata)
# get only the last label representing
# the current element iteration
dataY.append(dataset[i + lags])
return np.array(dataX), np.array(dataY)
如果您在 RNN 中使用
look back
,您还需要增加输入维度,因为您还要查看先前的样本。网络确实正在寻找比 1 个样本更多的数据,因为它需要“回顾”更多样本以了解 mid/long 范围依赖性。
这比实际更概念化,在您的代码中没有问题,因为
lags = 1
:
model.add(SimpleRNN(1, input_shape=(1, 1)))
# you should use lags in the input shape
model.add(SimpleRNN(1, input_shape=(1, LAGS)))