拟合有状态 RNN 时的维度错误
Error with dimensionality when fitting a stateful RNN
我正在拟合一个带有嵌入层的有状态 RNN 来执行二元分类。我对函数 APIs.
中所需的 batch_size 和 batch_shape 感到有些困惑
xtrain_padded.shape = (9600, 1403); xtest_padded.shape = (2400, 1403); ytest.shape = (2400,)
input_dim = size of tokenizer word dictionary
output_dim = 100 from GloVe_100d embeddings
number of SimpleRNN layer units = 200
h0: initial hidden states sampled from random uniform.
h0 object has the same shape as RNN layer hidden states obtained when return_state = True.
模型结构:
batch_size = 2400 # highest common factor of xtrain and xtest
inp= Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= Embedding(input_dim, output_dim, input_length= input_length,
weights= [Emat], trainable= False, name= 'embedding')(inp)
rnn= SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
h_0 = tf.random.uniform((batch_size, input_length, 200))
rnn_out, rnn_state = rnn(emb_out, initial_state=h0)
mod_out= Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(2400, 1403)] 0
_________________________________________________________________
embedding (Embedding) (2400, 1403, 100) 4348900
_________________________________________________________________
simpleRNN (SimpleRNN) [(2400, 1403, 200), (2400 60200
_________________________________________________________________
dense_3 (Dense) (2400, 1403, 1) 201
当我使用模型 API:
将测试数据拟合到模型时没问题
mod_out_allsteps, rnn_ht= model(xte_pad) # Same as the 2 items from model.predict(xte_pad)
print(mod_out_allsteps.shape, rnn_ht.shape)
>> (2400, 1403, 1) (2400, 1403, 200)
然而,当我使用 model.fit
.
时,它引发了关于不等尺寸的 ValueError
model.fit(xte_pad, yte, epochs =1, batch_size = batch_size, verbose = 1)
>>
ValueError: Dimensions must be equal, but are 2400 and 1403 for '{{node binary_crossentropy_1/mul}} = Mul[T=DT_FLOAT](binary_crossentropy_1/Cast, binary_crossentropy_1/Log)' with input shapes: [2400,1], [2400,1403,200].
该错误似乎表明模型在拟合数据时将返回的隐藏状态 rnn_ht
形状 [2400,1403,200] 与其他东西混淆了。但是我将需要这些状态来计算初始隐藏状态的梯度,即
对于 t = 1,..., 1403.
我对有状态 RNN 中的维度感到困惑:
- 如果 stateful = True,我们是否基于一个小批量构建模型?
即每一层的输出形状中的第一个索引将是 batch_size?
- 第一层(Input)要设置的batch_shape是什么?我设置对了吗?
提前感谢您帮助解决错误和我的困惑!
更新:
batch_size = 2400 # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
rnn= tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_ht= rnn(emb_out) # hidden states at all steps
print(rnn_ht.shape)
>>>
(2400, 1403, 200)
mod_out= Dense(1, activation= 'sigmoid')(Flatten()(rnn_ht))
# Extract the y_t's and h_t's:
model = tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_ht])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(2400, 1403)] 0
_________________________________________________________________
embedding (Embedding) (2400, 1403, 100) 50000
_________________________________________________________________
simpleRNN (SimpleRNN) (2400, 1403, 200) 60200
_________________________________________________________________
flatten_4 (Flatten) (2400, 280600) 0
_________________________________________________________________
dense_4 (Dense) (2400, 1) 280601
mod_out_allsteps, rnn_ht= model_ht(xte_pad)
print(mod_out_allsteps.shape, rnn_ht.shape)
>>>
(2400, 1) (2400, 1403, 200)
But the error with ```model.fit``` persists.
查看模型摘要中的最后一层。由于您在 RNN
层中将参数 return_sequences
设置为 True
,因此您将获得一个时间步数与输入相同且输出 space 为 200 的序列对于每个时间步长,因此形状为 (2400, 1403, 200)
,其中 2400 是批量大小。如果您将此参数设置为 False
,一切都应该有效,因为您的标签具有 (2400, 1)
.
的形状
工作示例:
import tensorflow as tf
batch_size = 2400 # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
rnn= tf.keras.layers.SimpleRNN(200, return_sequences= False, return_state= True, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_out, rnn_state = rnn(emb_out)
mod_out= tf.keras.layers.Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
第一个输出是您的二元决策。
更新 1:Flatten
:
import tensorflow as tf
batch_size = 2400 # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
rnn= tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_out, rnn_state = rnn(emb_out)
mod_out= tf.keras.layers.Dense(1, activation= 'sigmoid')(tf.keras.layers.Flatten()(rnn_out))
# Extract the y_t's and h_t's:
model = tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
我正在拟合一个带有嵌入层的有状态 RNN 来执行二元分类。我对函数 APIs.
中所需的 batch_size 和 batch_shape 感到有些困惑xtrain_padded.shape = (9600, 1403); xtest_padded.shape = (2400, 1403); ytest.shape = (2400,)
input_dim = size of tokenizer word dictionary
output_dim = 100 from GloVe_100d embeddings
number of SimpleRNN layer units = 200
h0: initial hidden states sampled from random uniform.
h0 object has the same shape as RNN layer hidden states obtained when return_state = True.
模型结构:
batch_size = 2400 # highest common factor of xtrain and xtest
inp= Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= Embedding(input_dim, output_dim, input_length= input_length,
weights= [Emat], trainable= False, name= 'embedding')(inp)
rnn= SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
h_0 = tf.random.uniform((batch_size, input_length, 200))
rnn_out, rnn_state = rnn(emb_out, initial_state=h0)
mod_out= Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(2400, 1403)] 0
_________________________________________________________________
embedding (Embedding) (2400, 1403, 100) 4348900
_________________________________________________________________
simpleRNN (SimpleRNN) [(2400, 1403, 200), (2400 60200
_________________________________________________________________
dense_3 (Dense) (2400, 1403, 1) 201
当我使用模型 API:
将测试数据拟合到模型时没问题mod_out_allsteps, rnn_ht= model(xte_pad) # Same as the 2 items from model.predict(xte_pad)
print(mod_out_allsteps.shape, rnn_ht.shape)
>> (2400, 1403, 1) (2400, 1403, 200)
然而,当我使用 model.fit
.
model.fit(xte_pad, yte, epochs =1, batch_size = batch_size, verbose = 1)
>>
ValueError: Dimensions must be equal, but are 2400 and 1403 for '{{node binary_crossentropy_1/mul}} = Mul[T=DT_FLOAT](binary_crossentropy_1/Cast, binary_crossentropy_1/Log)' with input shapes: [2400,1], [2400,1403,200].
该错误似乎表明模型在拟合数据时将返回的隐藏状态 rnn_ht
形状 [2400,1403,200] 与其他东西混淆了。但是我将需要这些状态来计算初始隐藏状态的梯度,即
我对有状态 RNN 中的维度感到困惑:
- 如果 stateful = True,我们是否基于一个小批量构建模型?
即每一层的输出形状中的第一个索引将是 batch_size? - 第一层(Input)要设置的batch_shape是什么?我设置对了吗?
提前感谢您帮助解决错误和我的困惑!
更新:
batch_size = 2400 # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
rnn= tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_ht= rnn(emb_out) # hidden states at all steps
print(rnn_ht.shape)
>>>
(2400, 1403, 200)
mod_out= Dense(1, activation= 'sigmoid')(Flatten()(rnn_ht))
# Extract the y_t's and h_t's:
model = tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_ht])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(2400, 1403)] 0
_________________________________________________________________
embedding (Embedding) (2400, 1403, 100) 50000
_________________________________________________________________
simpleRNN (SimpleRNN) (2400, 1403, 200) 60200
_________________________________________________________________
flatten_4 (Flatten) (2400, 280600) 0
_________________________________________________________________
dense_4 (Dense) (2400, 1) 280601
mod_out_allsteps, rnn_ht= model_ht(xte_pad)
print(mod_out_allsteps.shape, rnn_ht.shape)
>>>
(2400, 1) (2400, 1403, 200)
But the error with ```model.fit``` persists.
查看模型摘要中的最后一层。由于您在 RNN
层中将参数 return_sequences
设置为 True
,因此您将获得一个时间步数与输入相同且输出 space 为 200 的序列对于每个时间步长,因此形状为 (2400, 1403, 200)
,其中 2400 是批量大小。如果您将此参数设置为 False
,一切都应该有效,因为您的标签具有 (2400, 1)
.
工作示例:
import tensorflow as tf
batch_size = 2400 # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
rnn= tf.keras.layers.SimpleRNN(200, return_sequences= False, return_state= True, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_out, rnn_state = rnn(emb_out)
mod_out= tf.keras.layers.Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
第一个输出是您的二元决策。
更新 1:Flatten
:
import tensorflow as tf
batch_size = 2400 # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)
rnn= tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_out, rnn_state = rnn(emb_out)
mod_out= tf.keras.layers.Dense(1, activation= 'sigmoid')(tf.keras.layers.Flatten()(rnn_out))
# Extract the y_t's and h_t's:
model = tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()