TensorFlow CNN 在分批时表现不同

TensorFlow CNN behaving differently when batch is divided

最初我让 CNN 使用以下代码:

for i in range(1000):
    x_batch = []
    y_batch = []
    cost_ = 0.

    x_batch = x
    y_batch = y_data

    sess.run(train_op, feed_dict={X: x_batch, Y: y_batch, p_keep_conv: 0.8, p_keep_hidden: 0.5})
    cost_ += (sess.run(cost, feed_dict={X: x_batch, Y: y_batch, p_keep_conv: 0.8, p_keep_hidden: 0.5}))
    print(cost_)

但后来我意识到我不能使用更大的数据集,因为它会很快用完所有可用内存。 相反,我将代码重写如下:

for i in range(1000):
    x_batch = []
    y_batch = []
    cost_ = 0.
    for i in range(0, len(y_data), 100):
        x_batch = x[i:i+100]
        y_batch = y_data[i:i+100]

        sess.run(train_op, feed_dict={X: x_batch, Y: y_batch, p_keep_conv: 0.8, p_keep_hidden: 0.5})
        cost_ += (sess.run(cost, feed_dict={X: x_batch, Y: y_batch, p_keep_conv: 0.8, p_keep_hidden: 0.5}))
    print(cost_)

应该将输入分成批次以减少视频卡使用的内存量。问题是现在它没有像以前那样准确。 准确率一开始是89%,现在只有33%。

Gradient Descent 切换到 Stochastic Gradient Descent 您需要记住一些事情。

  1. 批量大小会影响神经网络的最终性能。我会尝试 128 或 256。

    A typical minibatch size is 256, although the optimal size of the minibatch can vary for different applications and architectures.

  2. 您想使用较小的学习率,也许可以尝试合并学习率衰减。

    learning rate α is typically much smaller than a corresponding learning rate in batch gradient descent because there is much more variance in the update.

  3. 你应该在每个时期随机化你的训练数据。

    If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence.

所有引用均来自 this article. 可能值得进一步研究梯度下降和随机梯度下降之间的差异。