tensorflow:获得关于输入的 RNN 隐藏状态梯度

tensorflow: Obtain RNN hidden states gradients with respect to input

我的模型由一个嵌入层和一个 SimpleRNN 层组成。我已经用 model.predict 获得了所有步骤的隐藏状态,并根据这些步骤绘制了它们。我发现隐藏状态收敛到零,但我不确定是否可以从中推断出任何东西。因此,根据模型输入绘制它们的梯度可能会为我提供一些进一步的见解。我需要一些有关获取这些渐变的帮助。

我的模特:

batch_size = 9600   # batch size can take a smaller value, e.g. 100
inp= Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out= Embedding(input_dim, output_dim, input_length= input_length, 
                         weights= [Emat], trainable= False, name= 'embedding')(inp)
rnn= SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
               batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

h0 = tf.random.uniform((batch_size, 200))
rnn_allstates = rnn(emb_out, initial_state=h0)
print(rnn_allstates.shape)   # (9600, 1403, 200)
model_rnn = Model(inputs=inp, outputs= rnn_allstates, name= 'model_rnn')
model_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model_rnn.summary()

>>>
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(9600, 1403)]            0         
_________________________________________________________________
embedding (Embedding)        (9600, 1403, 100)         4348900   
_________________________________________________________________
simpleRNN (SimpleRNN)        (9600, 1403, 200)         60200     
=================================================================

获取隐藏状态:

rnn_ht = model_rnn.predict(xtr_pad)   # xtr_pad.shape = (9600,1403)
rnn_ht_red= np.mean(rnn_ht, 2)
rnn_ht_red= np.mean(rnn_ht_red,0)
steps= [t for t in range(1403)]
plt.plot(steps, rnn_ht_red, linestyle= 'dotted')

尝试获取梯度:

sess= k.get_session()
# The hidden states tf.Variable shaped (n_samples = 9600, n_steps = 1403, n_units = 200):
states_var= model_rnn.output  
# A list of hidden states variable for all time steps, aggregated over samples and RNN units:
ht_vars= [states_var[:, t, :] for t in range(1403)]        # each item in list has shape (9600, 200)
ht_vars_agg= [tf.reduce_mean(ht,[0,1]) for ht in ht_vars]  # each item in list has shape (), because I wish to obtain a SINGLE gradient value at each time step.

# Create gradient function and feed data:
dhtdx_vars= [k.gradients(ht, model_rnn.input) for ht in ht_vars_agg]
dhtdx= [sess.run(pd, feed_dict={model_rnn.input: xtr_pad} ) for pd in dhtdx_vars  ]

下面的错误指向上面的最后一行

TypeError: Fetch argument None has invalid type <class 'NoneType'>

dhtdx_vars 中的每个后端渐变项都是 [None]。当我删除聚合线时,同样的错误仍然存​​在。

也尝试使用渐变胶带 returns None 计算出的渐变有误。

with tf.GradientTape() as tape:
    x= model_rnn.input
    ht = model_rnn(x)
grad = tape.gradient(ht, model_rnn.input)

在此先感谢您的帮助。

问题是 tf.GradientTape() 不通过整数输入传播梯度。这可能是您获得 None 渐变的原因。您可以做的是计算相对于 Embedding 层输出的梯度,如下所示:

import tensorflow as tf

input_length = 1403
inp= tf.keras.layers.Input(shape= (input_length,)) 
emb_out= tf.keras.layers.Embedding(500, 100, input_length= input_length, trainable= False)(inp)
rnn_out = tf.keras.layers.SimpleRNN(200,  return_sequences = True)(emb_out)

model = tf.keras.Model(inputs=inp, outputs=rnn_out)
model.summary()


xte_pad = tf.random.uniform((10, 1403), maxval=500, dtype=tf.int32)
y = tf.random.normal((10, 1403, 200))
ds = tf.data.Dataset.from_tensor_slices((xte_pad, y)).batch(5)

embedding_layer = model.layers[1]
rnn_layer = model.layers[2]
epochs = 1
for epoch in range(epochs):
  for step, (x_batch_train, y_batch_train) in enumerate(ds):
    with tf.GradientTape() as tape:
        embedded_x = embedding_layer(x_batch_train)
        tape.watch(embedded_x)
        y = rnn_layer(embedded_x)

    grads = tape.gradient(y, embedded_x)
    tf.print(grads.shape)
Model: "model_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_13 (InputLayer)       [(None, 1403)]            0         
                                                                 
 embedding_12 (Embedding)    (None, 1403, 100)         50000     
                                                                 
 simple_rnn_12 (SimpleRNN)   (None, 1403, 200)         60200     
                                                                 
=================================================================
Total params: 110,200
Trainable params: 60,200
Non-trainable params: 50,000
_________________________________________________________________
TensorShape([5, 1403, 100])
TensorShape([5, 1403, 100])