Tensorflow 动态 RNN - 形状
Tensorflow dynamic RNN - Shapes
亲爱的程序员大家好!
我有一个视频的多个帧,我希望我的 RNN 中的层数与我得到的帧数一样多,这样我就可以为每个层提供一个帧。
备注:
frame shape = 224, 224, 3 (但我把它弄平了)
每个视频的帧数 = 20 = 内层数
此刻我得到了这个:
timesteps = 20
inner_layer_size = 100
output_layer_size = 2
sdev = 0.1
inputs = 224 * 224 * 3
x = tf.placeholder(tf.float32, shape=(None, timesteps, inputs), name="x")
y = tf.placeholder(tf.int32, shape=(None), name="y")
# Compute the layers
lstm_cell = tf.contrib.rnn.LSTMCell(num_units=inner_layer_size)
outputs, state = tf.nn.dynamic_rnn(cell=lstm_cell, dtype=tf.float32, inputs=x)
Wz = tf.get_variable(name="Wz", shape=(inner_layer_size, output_layer_size),
initializer=tf.truncated_normal_initializer(stddev=sdev))
bz = tf.get_variable(name="bz", shape=(1, output_layer_size),
initializer=tf.constant_initializer(0.0))
logits = tf.matmul(state, Wz) + bz
prediction = tf.nn.softmax(logits)
我知道这不是我想要的方式。
如果你看here第一张图片,很明显每一层的输入都是帧的一部分而不是整个帧。
我现在的问题是如何更改它以及如何调整我的 'Wz' 和 'bz'?
感谢您抽出时间:)
问题是您将 LSTM 的 state
传递给密集层而不是 outputs
。
您的案例的输出将是 [None, 68, 100]
。您需要拆分 time_steps
,然后将其传递给密集层。这可以通过以下代码实现:
# LSTM output
lstm_cell = tf.contrib.rnn.LSTMCell(num_units=inner_layer_size)
outputs, state = tf.nn.dynamic_rnn(cell=lstm_cell, dtype=tf.float32, inputs=x)
#Split the outputs across time_steps.
lstm_sequence = tf.split(outputs, tf.ones((timesteps), dtype=tf.int32 ), 1)
#Dense layer to be applied for each time steps.
def dense(inputs, reuse=False):
with tf.variable_scope('MLP', reuse=reuse):
Wz = tf.get_variable(name="Wz", shape=(inner_layer_size, output_layer_size),
initializer=tf.truncated_normal_initializer(stddev=sdev))
bz = tf.get_variable(name="bz", shape=(1, output_layer_size),
initializer=tf.constant_initializer(0.0))
logits = tf.matmul(inputs, Wz) + bz
prediction = tf.nn.softmax(logits)
return prediction
# Pass each time step outputs of the LSTM to the dense layer.
#The layer should have shared weights
out = []
for i, frame in enumerate(lstm_sequence):
if i == 0:
out.append(dense(tf.reshape(frame, [-1, inner_layer_size])))
else:
out.append(dense(tf.reshape(frame, [-1, inner_layer_size]),reuse=True))
亲爱的程序员大家好!
我有一个视频的多个帧,我希望我的 RNN 中的层数与我得到的帧数一样多,这样我就可以为每个层提供一个帧。
备注:
frame shape = 224, 224, 3 (但我把它弄平了)
每个视频的帧数 = 20 = 内层数
此刻我得到了这个:
timesteps = 20
inner_layer_size = 100
output_layer_size = 2
sdev = 0.1
inputs = 224 * 224 * 3
x = tf.placeholder(tf.float32, shape=(None, timesteps, inputs), name="x")
y = tf.placeholder(tf.int32, shape=(None), name="y")
# Compute the layers
lstm_cell = tf.contrib.rnn.LSTMCell(num_units=inner_layer_size)
outputs, state = tf.nn.dynamic_rnn(cell=lstm_cell, dtype=tf.float32, inputs=x)
Wz = tf.get_variable(name="Wz", shape=(inner_layer_size, output_layer_size),
initializer=tf.truncated_normal_initializer(stddev=sdev))
bz = tf.get_variable(name="bz", shape=(1, output_layer_size),
initializer=tf.constant_initializer(0.0))
logits = tf.matmul(state, Wz) + bz
prediction = tf.nn.softmax(logits)
我知道这不是我想要的方式。 如果你看here第一张图片,很明显每一层的输入都是帧的一部分而不是整个帧。
我现在的问题是如何更改它以及如何调整我的 'Wz' 和 'bz'? 感谢您抽出时间:)
问题是您将 LSTM 的 state
传递给密集层而不是 outputs
。
您的案例的输出将是 [None, 68, 100]
。您需要拆分 time_steps
,然后将其传递给密集层。这可以通过以下代码实现:
# LSTM output
lstm_cell = tf.contrib.rnn.LSTMCell(num_units=inner_layer_size)
outputs, state = tf.nn.dynamic_rnn(cell=lstm_cell, dtype=tf.float32, inputs=x)
#Split the outputs across time_steps.
lstm_sequence = tf.split(outputs, tf.ones((timesteps), dtype=tf.int32 ), 1)
#Dense layer to be applied for each time steps.
def dense(inputs, reuse=False):
with tf.variable_scope('MLP', reuse=reuse):
Wz = tf.get_variable(name="Wz", shape=(inner_layer_size, output_layer_size),
initializer=tf.truncated_normal_initializer(stddev=sdev))
bz = tf.get_variable(name="bz", shape=(1, output_layer_size),
initializer=tf.constant_initializer(0.0))
logits = tf.matmul(inputs, Wz) + bz
prediction = tf.nn.softmax(logits)
return prediction
# Pass each time step outputs of the LSTM to the dense layer.
#The layer should have shared weights
out = []
for i, frame in enumerate(lstm_sequence):
if i == 0:
out.append(dense(tf.reshape(frame, [-1, inner_layer_size])))
else:
out.append(dense(tf.reshape(frame, [-1, inner_layer_size]),reuse=True))