在测试阶段使用带有 dropout 的批量归一化给出 Nan

Using batch normalization with dropout in test phase gives Nan

我正在尝试向我的 CNN 添加批量归一化,并且已经阅读了很多关于如何这样做的文章,但是当将训练设置为 False 时,我的实现仍然会产生一个 nan 数组。

即使在测试时将训练设置为 True,结果也不是 Nan,但如果我在训练图像上进行测试,结果会比训练时差

我使用了 0.9 的衰减并训练了 15 000 次迭代

这是我的图形构建,按照 tf.layers.batch_normalization documentation 中的建议添加更新操作作为依赖项,然后使用 sess

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(extra_update_ops):
    phase_train = tf.placeholder(tf.bool, name='phase_train')

    ###### Other placeholders and variables declarations ######

    # Build a Graph that computes the logits predictions from the inference model.

    loss, eval_prediction = inference(train_data_node, train_labels_node, batch_size, phase_train, dropout_out_keep_prob)

    # Build a Graph that trains the model with one batch of examples and updates the model parameters.

    ###### Should I rather put the dependency here ? ######
    train_op = train(loss, global_step)

    saver = tf.train.Saver(tf.global_variables())

    with tf.Session() as sess:
          init = tf.global_variables_initializer()
          sess.run(init)

          # Start the queue runners.
          coord = tf.train.Coordinator()
          threads = tf.train.start_queue_runners(sess=sess, coord=coord)

          for step in range(startstep, startstep + max_steps):
            feed_dict = fill_feed_dict(train_labels_node, train_data_node, dropout_out_keep_prob, phase_train, batch_size, phase_train_val=True,drop_out_keep_prob_val=1.)
            _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)

这是我的 batch_norm 函数调用:

def batch_norm_layer(inputT, is_training, scope):
    return tf.layers.batch_normalization(inputT, training=is_training, center=False, reuse=None, momentum=0.9)

下面是我如何恢复模型以进行 测试:

phase_train = tf.placeholder(tf.bool, name='phase_train')

###### Other placeholder definitions ######

loss, logits = inference(test_data_node, test_labels_node, batch_size, phase_train, drop_out_keep_prob)
pred = tf.argmax(logits, dimension=3)

saver = tf.train.Saver()

with tf.Session() as sess:
  saver.restore(sess, test_ckpt)

  threads = tf.train.start_queue_runners(sess=sess)

  feed_dict = fill_feed_dict(test_labels_node, test_data_node, drop_out_keep_prob, phase_train, batch_size=1, phase_train_val=False, drop_out_keep_prob_val=1.)

  pred_loss, dense_prediction, predicted_image = sess.run([loss, logits, pred], feed_dict=feed_dict)

这里dense_prediction给出了一个Nans数组,因此predicted_image全为0 我的构造有错误吗?我该如何修复/诊断它?

欢迎任何帮助,我已经阅读了很多使用 "hand made" 批处理规范的教程,但我找不到关于如何使用官方批处理规范的指导教程,我猜是因为太明显了,但不适合我!

看来问题是因为我使用了 批量归一化 以及 tf.nn.dropout dropout 实现。

切换到 tf.layers.dropout 解决了问题。