为什么我们将 Mnist 训练图像重塑为 (60000,28,28,1) 而不是像这样直接使用 (60,28,28)？

Question

此代码用于使用 Mnist 数据集进行图像分类的训练模型。我不明白的是为什么我们将训练图像重塑为 (60000,28,28,1) 而不是像这样直接使用它 (60,28,28)。

num_classes = 10
input_shape = (28, 28, 1)


(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

#print(x_train[0])

x_train = x_train.astype("float32") / 255 

#print(x_train[0])

x_test = x_test.astype("float32") / 255

print(x_train.shape)
print(x_test.shape)

x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print("x_train shape:", x_test.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

print()
print(y_train)

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print()
print(y_train)

Answer 1

在机器学习中理解数据非常重要，就像这个案例一样。有 60000 个火车图像开始，10000 个图像用于测试目的。

每张图片大小为28*28像素；即 28 像素高和 28 像素宽，因此 (28, 28, 1)，最后一部分的 1 是指定像素的颜色深度。 1 为灰度图像（黑白图像）。

所以在这里使用 (60, 28, 28, 1) 是不可能的。现在我们为什么要使用 (60000, 28, 28, 1) - 这是我们数据的矩阵形状，因为我们有 60000 张图像，其中 28*28 像素，每个像素在此矩阵中都有一个值。

为简化起见，假设我们只有 1 张图像，那么它就像 (1, 28, 28, 1)，可以很容易地写成矩阵形式，如 28*28 矩阵。

Answer 2

重塑以适应其他架构很重要，例如使用 tensorflow.image.resize

需要至少 75*75 的 inceptionv3

x_train = tensorflow.image.resize(x_train, [75,75])
x_test = tensorflow.image.resize(x_test, [75,75])

为什么我们将 Mnist 训练图像重塑为 (60000,28,28,1) 而不是像这样直接使用 (60,28,28)？

why do we reshape the Mnist training images to (60000,28,28,1) instead of using it directly like this (60,28,28)?

python

mnist

conv-neural-network

image-classification