拟合有状态 RNN 时的维度错误

Error with dimensionality when fitting a stateful RNN

我正在拟合一个带有嵌入层的有状态 RNN 来执行二元分类。我对函数 APIs.

中所需的 batch_size 和 batch_shape 感到有些困惑
xtrain_padded.shape = (9600, 1403); xtest_padded.shape = (2400, 1403); ytest.shape = (2400,)
input_dim = size of tokenizer word dictionary
output_dim = 100 from GloVe_100d embeddings
number of SimpleRNN layer units = 200

h0: initial hidden states sampled from random uniform. 
h0 object has the same shape as RNN layer hidden states obtained when return_state = True.

模型结构:

batch_size = 2400  # highest common factor of xtrain and xtest
inp= Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out= Embedding(input_dim, output_dim, input_length= input_length, 
                         weights= [Emat], trainable= False, name= 'embedding')(inp)

rnn= SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

h_0 = tf.random.uniform((batch_size, input_length, 200))
rnn_out, rnn_state = rnn(emb_out, initial_state=h0)
mod_out= Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(2400, 1403)]            0         
_________________________________________________________________
embedding (Embedding)        (2400, 1403, 100)         4348900   
_________________________________________________________________
simpleRNN (SimpleRNN)        [(2400, 1403, 200), (2400 60200     
_________________________________________________________________
dense_3 (Dense)              (2400, 1403, 1)           201       

当我使用模型 API:

将测试数据拟合到模型时没问题
mod_out_allsteps, rnn_ht= model(xte_pad)  # Same as the 2 items from model.predict(xte_pad) 
print(mod_out_allsteps.shape, rnn_ht.shape) 
>> (2400, 1403, 1) (2400, 1403, 200)

然而,当我使用 model.fit.

时,它引发了关于不等尺寸的 ValueError
model.fit(xte_pad, yte, epochs =1, batch_size = batch_size, verbose = 1)
>>
    ValueError: Dimensions must be equal, but are 2400 and 1403 for '{{node binary_crossentropy_1/mul}} = Mul[T=DT_FLOAT](binary_crossentropy_1/Cast, binary_crossentropy_1/Log)' with input shapes: [2400,1], [2400,1403,200].

该错误似乎表明模型在拟合数据时将返回的隐藏状态 rnn_ht 形状 [2400,1403,200] 与其他东西混淆了。但是我将需要这些状态来计算初始隐藏状态的梯度,即 对于 t = 1,..., 1403.

我对有状态 RNN 中的维度感到困惑:

  1. 如果 stateful = True,我们是否基于一个小批量构建模型?
    即每一层的输出形状中的第一个索引将是 batch_size?
  2. 第一层(Input)要设置的batch_shape是什么?我设置对了吗?

提前感谢您帮助解决错误和我的困惑!


更新:

batch_size = 2400  # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)

rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_ht= rnn(emb_out)  # hidden states at all steps 
print(rnn_ht.shape)
>>> 
(2400, 1403, 200)

mod_out= Dense(1, activation= 'sigmoid')(Flatten()(rnn_ht))
# Extract the y_t's and h_t's:
model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_ht])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(2400, 1403)]            0         
_________________________________________________________________
embedding (Embedding)        (2400, 1403, 100)         50000     
_________________________________________________________________
simpleRNN (SimpleRNN)        (2400, 1403, 200)         60200     
_________________________________________________________________
flatten_4 (Flatten)          (2400, 280600)            0         
_________________________________________________________________
dense_4 (Dense)              (2400, 1)                 280601    


mod_out_allsteps, rnn_ht= model_ht(xte_pad)   
print(mod_out_allsteps.shape, rnn_ht.shape)  
>>> 
(2400, 1) (2400, 1403, 200)

But the error with ```model.fit``` persists.

查看模型摘要中的最后一层。由于您在 RNN 层中将参数 return_sequences 设置为 True,因此您将获得一个时间步数与输入相同且输出 space 为 200 的序列对于每个时间步长,因此形状为 (2400, 1403, 200),其中 2400 是批量大小。如果您将此参数设置为 False,一切都应该有效,因为您的标签具有 (2400, 1).

的形状

工作示例:

import tensorflow as tf

batch_size = 2400  # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)

rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= False, return_state= True, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

rnn_out, rnn_state = rnn(emb_out)
mod_out=  tf.keras.layers.Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()

第一个输出是您的二元决策。

更新 1Flatten:

import tensorflow as tf

batch_size = 2400  # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)

rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

rnn_out, rnn_state = rnn(emb_out)
mod_out=  tf.keras.layers.Dense(1, activation= 'sigmoid')(tf.keras.layers.Flatten()(rnn_out))
# Extract the y_t's and h_t's:
model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()