tensorflow:获得关于输入的 RNN 隐藏状态梯度
tensorflow: Obtain RNN hidden states gradients with respect to input
我的模型由一个嵌入层和一个 SimpleRNN 层组成。我已经用 model.predict
获得了所有步骤的隐藏状态,并根据这些步骤绘制了它们。我发现隐藏状态收敛到零,但我不确定是否可以从中推断出任何东西。因此,根据模型输入绘制它们的梯度可能会为我提供一些进一步的见解。我需要一些有关获取这些渐变的帮助。
我的模特:
batch_size = 9600 # batch size can take a smaller value, e.g. 100
inp= Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= Embedding(input_dim, output_dim, input_length= input_length,
weights= [Emat], trainable= False, name= 'embedding')(inp)
rnn= SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
h0 = tf.random.uniform((batch_size, 200))
rnn_allstates = rnn(emb_out, initial_state=h0)
print(rnn_allstates.shape) # (9600, 1403, 200)
model_rnn = Model(inputs=inp, outputs= rnn_allstates, name= 'model_rnn')
model_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model_rnn.summary()
>>>
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(9600, 1403)] 0
_________________________________________________________________
embedding (Embedding) (9600, 1403, 100) 4348900
_________________________________________________________________
simpleRNN (SimpleRNN) (9600, 1403, 200) 60200
=================================================================
获取隐藏状态:
rnn_ht = model_rnn.predict(xtr_pad) # xtr_pad.shape = (9600,1403)
rnn_ht_red= np.mean(rnn_ht, 2)
rnn_ht_red= np.mean(rnn_ht_red,0)
steps= [t for t in range(1403)]
plt.plot(steps, rnn_ht_red, linestyle= 'dotted')
尝试获取梯度:
sess= k.get_session()
# The hidden states tf.Variable shaped (n_samples = 9600, n_steps = 1403, n_units = 200):
states_var= model_rnn.output
# A list of hidden states variable for all time steps, aggregated over samples and RNN units:
ht_vars= [states_var[:, t, :] for t in range(1403)] # each item in list has shape (9600, 200)
ht_vars_agg= [tf.reduce_mean(ht,[0,1]) for ht in ht_vars] # each item in list has shape (), because I wish to obtain a SINGLE gradient value at each time step.
# Create gradient function and feed data:
dhtdx_vars= [k.gradients(ht, model_rnn.input) for ht in ht_vars_agg]
dhtdx= [sess.run(pd, feed_dict={model_rnn.input: xtr_pad} ) for pd in dhtdx_vars ]
下面的错误指向上面的最后一行
TypeError: Fetch argument None has invalid type <class 'NoneType'>
dhtdx_vars
中的每个后端渐变项都是 [None]。当我删除聚合线时,同样的错误仍然存在。
也尝试使用渐变胶带 returns None 计算出的渐变有误。
with tf.GradientTape() as tape:
x= model_rnn.input
ht = model_rnn(x)
grad = tape.gradient(ht, model_rnn.input)
在此先感谢您的帮助。
问题是 tf.GradientTape()
不通过整数输入传播梯度。这可能是您获得 None
渐变的原因。您可以做的是计算相对于 Embedding
层输出的梯度,如下所示:
import tensorflow as tf
input_length = 1403
inp= tf.keras.layers.Input(shape= (input_length,))
emb_out= tf.keras.layers.Embedding(500, 100, input_length= input_length, trainable= False)(inp)
rnn_out = tf.keras.layers.SimpleRNN(200, return_sequences = True)(emb_out)
model = tf.keras.Model(inputs=inp, outputs=rnn_out)
model.summary()
xte_pad = tf.random.uniform((10, 1403), maxval=500, dtype=tf.int32)
y = tf.random.normal((10, 1403, 200))
ds = tf.data.Dataset.from_tensor_slices((xte_pad, y)).batch(5)
embedding_layer = model.layers[1]
rnn_layer = model.layers[2]
epochs = 1
for epoch in range(epochs):
for step, (x_batch_train, y_batch_train) in enumerate(ds):
with tf.GradientTape() as tape:
embedded_x = embedding_layer(x_batch_train)
tape.watch(embedded_x)
y = rnn_layer(embedded_x)
grads = tape.gradient(y, embedded_x)
tf.print(grads.shape)
Model: "model_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_13 (InputLayer) [(None, 1403)] 0
embedding_12 (Embedding) (None, 1403, 100) 50000
simple_rnn_12 (SimpleRNN) (None, 1403, 200) 60200
=================================================================
Total params: 110,200
Trainable params: 60,200
Non-trainable params: 50,000
_________________________________________________________________
TensorShape([5, 1403, 100])
TensorShape([5, 1403, 100])
我的模型由一个嵌入层和一个 SimpleRNN 层组成。我已经用 model.predict
获得了所有步骤的隐藏状态,并根据这些步骤绘制了它们。我发现隐藏状态收敛到零,但我不确定是否可以从中推断出任何东西。因此,根据模型输入绘制它们的梯度可能会为我提供一些进一步的见解。我需要一些有关获取这些渐变的帮助。
我的模特:
batch_size = 9600 # batch size can take a smaller value, e.g. 100
inp= Input(batch_shape= (batch_size, input_length), name= 'input')
emb_out= Embedding(input_dim, output_dim, input_length= input_length,
weights= [Emat], trainable= False, name= 'embedding')(inp)
rnn= SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
h0 = tf.random.uniform((batch_size, 200))
rnn_allstates = rnn(emb_out, initial_state=h0)
print(rnn_allstates.shape) # (9600, 1403, 200)
model_rnn = Model(inputs=inp, outputs= rnn_allstates, name= 'model_rnn')
model_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model_rnn.summary()
>>>
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(9600, 1403)] 0
_________________________________________________________________
embedding (Embedding) (9600, 1403, 100) 4348900
_________________________________________________________________
simpleRNN (SimpleRNN) (9600, 1403, 200) 60200
=================================================================
获取隐藏状态:
rnn_ht = model_rnn.predict(xtr_pad) # xtr_pad.shape = (9600,1403)
rnn_ht_red= np.mean(rnn_ht, 2)
rnn_ht_red= np.mean(rnn_ht_red,0)
steps= [t for t in range(1403)]
plt.plot(steps, rnn_ht_red, linestyle= 'dotted')
尝试获取梯度:
sess= k.get_session()
# The hidden states tf.Variable shaped (n_samples = 9600, n_steps = 1403, n_units = 200):
states_var= model_rnn.output
# A list of hidden states variable for all time steps, aggregated over samples and RNN units:
ht_vars= [states_var[:, t, :] for t in range(1403)] # each item in list has shape (9600, 200)
ht_vars_agg= [tf.reduce_mean(ht,[0,1]) for ht in ht_vars] # each item in list has shape (), because I wish to obtain a SINGLE gradient value at each time step.
# Create gradient function and feed data:
dhtdx_vars= [k.gradients(ht, model_rnn.input) for ht in ht_vars_agg]
dhtdx= [sess.run(pd, feed_dict={model_rnn.input: xtr_pad} ) for pd in dhtdx_vars ]
下面的错误指向上面的最后一行
TypeError: Fetch argument None has invalid type <class 'NoneType'>
dhtdx_vars
中的每个后端渐变项都是 [None]。当我删除聚合线时,同样的错误仍然存在。
也尝试使用渐变胶带 returns None 计算出的渐变有误。
with tf.GradientTape() as tape:
x= model_rnn.input
ht = model_rnn(x)
grad = tape.gradient(ht, model_rnn.input)
在此先感谢您的帮助。
问题是 tf.GradientTape()
不通过整数输入传播梯度。这可能是您获得 None
渐变的原因。您可以做的是计算相对于 Embedding
层输出的梯度,如下所示:
import tensorflow as tf
input_length = 1403
inp= tf.keras.layers.Input(shape= (input_length,))
emb_out= tf.keras.layers.Embedding(500, 100, input_length= input_length, trainable= False)(inp)
rnn_out = tf.keras.layers.SimpleRNN(200, return_sequences = True)(emb_out)
model = tf.keras.Model(inputs=inp, outputs=rnn_out)
model.summary()
xte_pad = tf.random.uniform((10, 1403), maxval=500, dtype=tf.int32)
y = tf.random.normal((10, 1403, 200))
ds = tf.data.Dataset.from_tensor_slices((xte_pad, y)).batch(5)
embedding_layer = model.layers[1]
rnn_layer = model.layers[2]
epochs = 1
for epoch in range(epochs):
for step, (x_batch_train, y_batch_train) in enumerate(ds):
with tf.GradientTape() as tape:
embedded_x = embedding_layer(x_batch_train)
tape.watch(embedded_x)
y = rnn_layer(embedded_x)
grads = tape.gradient(y, embedded_x)
tf.print(grads.shape)
Model: "model_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_13 (InputLayer) [(None, 1403)] 0
embedding_12 (Embedding) (None, 1403, 100) 50000
simple_rnn_12 (SimpleRNN) (None, 1403, 200) 60200
=================================================================
Total params: 110,200
Trainable params: 60,200
Non-trainable params: 50,000
_________________________________________________________________
TensorShape([5, 1403, 100])
TensorShape([5, 1403, 100])