为什么 reduce_mean 应用于 sparse_softmax_cross_entropy_with_logits 的输出?
Why is the reduce_mean applied to the output of sparse_softmax_cross_entropy_with_logits?
有几个教程将 reduce_mean
应用于 sparse_softmax_cross_entropy_with_logits
的输出。例如
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
或
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.cast(y_, dtype=tf.int32), logits=y_conv))
为什么 reduce_mean
应用于 sparse_softmax_cross_entropy_with_logits
的输出?是不是因为我们使用的是小批量,所以我们想计算(使用reduce_mean
)小批量所有样本的平均损失?
我发现了一些有趣的东西~
首先,我们定义sparse_vector为
sparse_vector = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.cast(y_, dtype=tf.int32), logits=y_conv)
sparse_vector是一个向量,我们应该计算它的求和,这就是为什么我们应该使用reduce_mean.
import numpy as np
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.InteractiveSession(config=config)
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=False)
print(mnist.test.labels.shape)
print(mnist.train.labels.shape)
with tf.name_scope('inputs'):
X_ = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.int64, [None])
X = tf.reshape(X_, [-1, 28, 28, 1])
h_conv1 = tf.layers.conv2d(X, filters=32, kernel_size=5, strides=1,
padding='same', activation=tf.nn.relu, name='conv1')
h_pool1 = tf.layers.max_pooling2d(h_conv1, pool_size=2, strides=2,
padding='same', name='pool1')
h_conv2 = tf.layers.conv2d(h_pool1, filters=64, kernel_size=5, strides=1,
padding='same',activation=tf.nn.relu, name='conv2')
h_pool2 = tf.layers.max_pooling2d(h_conv2, pool_size=2, strides=2,
padding='same', name='pool2')
# flatten
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.layers.dense(h_pool2_flat, 1024, name='fc1', activation=tf.nn.relu)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, 0.5)
h_fc2 = tf.layers.dense(h_fc1_drop, units=10, name='fc2')
# y_conv = tf.nn.softmax(h_fc2)
y_conv = h_fc2
# print('Finished building network.')
# cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
sparse_vector = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.cast(y_, dtype=tf.int32), logits=y_conv)
cross_entropy = tf.reduce_mean(sparse_vector)
sess.run(tf.global_variables_initializer())
# print(sparse_vector)
# print(cross_entropy)
# Tensor("SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits:0", shape=(?,), dtype=float32)
# Tensor("Mean:0", shape=(), dtype=float32)
batch = mnist.train.next_batch(10)
sparse_vector,cross_entropy = sess.run(
[sparse_vector,cross_entropy],
feed_dict={X_: batch[0], y_: batch[1]})
print(sparse_vector)
print(cross_entropy)
输出是
[2.2213464 2.2676413 2.3555744 2.3196406 2.0794516 2.394274 2.266591
2.3139718 2.345526 2.3952296]
2.2959247
原因是为了得到批次的平均损失。
通常,您将使用大小 > 1 的输入批次训练神经网络,批次中的每个元素都会产生一个损失值,因此将这些合并为一个值的最简单方法是取平均值。
有几个教程将 reduce_mean
应用于 sparse_softmax_cross_entropy_with_logits
的输出。例如
cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
或
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.cast(y_, dtype=tf.int32), logits=y_conv))
为什么 reduce_mean
应用于 sparse_softmax_cross_entropy_with_logits
的输出?是不是因为我们使用的是小批量,所以我们想计算(使用reduce_mean
)小批量所有样本的平均损失?
我发现了一些有趣的东西~
首先,我们定义sparse_vector为
sparse_vector = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.cast(y_, dtype=tf.int32), logits=y_conv)
sparse_vector是一个向量,我们应该计算它的求和,这就是为什么我们应该使用reduce_mean.
import numpy as np
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.InteractiveSession(config=config)
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=False)
print(mnist.test.labels.shape)
print(mnist.train.labels.shape)
with tf.name_scope('inputs'):
X_ = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.int64, [None])
X = tf.reshape(X_, [-1, 28, 28, 1])
h_conv1 = tf.layers.conv2d(X, filters=32, kernel_size=5, strides=1,
padding='same', activation=tf.nn.relu, name='conv1')
h_pool1 = tf.layers.max_pooling2d(h_conv1, pool_size=2, strides=2,
padding='same', name='pool1')
h_conv2 = tf.layers.conv2d(h_pool1, filters=64, kernel_size=5, strides=1,
padding='same',activation=tf.nn.relu, name='conv2')
h_pool2 = tf.layers.max_pooling2d(h_conv2, pool_size=2, strides=2,
padding='same', name='pool2')
# flatten
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.layers.dense(h_pool2_flat, 1024, name='fc1', activation=tf.nn.relu)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, 0.5)
h_fc2 = tf.layers.dense(h_fc1_drop, units=10, name='fc2')
# y_conv = tf.nn.softmax(h_fc2)
y_conv = h_fc2
# print('Finished building network.')
# cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
sparse_vector = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.cast(y_, dtype=tf.int32), logits=y_conv)
cross_entropy = tf.reduce_mean(sparse_vector)
sess.run(tf.global_variables_initializer())
# print(sparse_vector)
# print(cross_entropy)
# Tensor("SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits:0", shape=(?,), dtype=float32)
# Tensor("Mean:0", shape=(), dtype=float32)
batch = mnist.train.next_batch(10)
sparse_vector,cross_entropy = sess.run(
[sparse_vector,cross_entropy],
feed_dict={X_: batch[0], y_: batch[1]})
print(sparse_vector)
print(cross_entropy)
输出是 [2.2213464 2.2676413 2.3555744 2.3196406 2.0794516 2.394274 2.266591 2.3139718 2.345526 2.3952296]
2.2959247
原因是为了得到批次的平均损失。
通常,您将使用大小 > 1 的输入批次训练神经网络,批次中的每个元素都会产生一个损失值,因此将这些合并为一个值的最简单方法是取平均值。