tf.keras 预测不好，评价不错

Question

我正在 tf.keras 中编写模型，运行 model.evaluate() 在训练集上的准确率通常约为 96%。我对测试集的评价通常很接近，大约 93%。然而，当我手动预测时，模型通常是不准确的。这是我的代码：

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import pandas as pd

!git clone https://github.com/DanorRon/data
%cd data
!ls

batch_size = 100
epochs = 15
alpha = 0.001
lambda_ = 0.001
h1 = 50

train = pd.read_csv('/content/data/mnist_train.csv.zip')
test = pd.read_csv('/content/data/mnist_test.csv.zip')

train = train.loc['1':'5000', :]
test = test.loc['1':'2000', :]

train = train.sample(frac=1).reset_index(drop=True)
test = test.sample(frac=1).reset_index(drop=True)

x_train = train.loc[:, '1x1':'28x28']
y_train = train.loc[:, 'label']

x_test = test.loc[:, '1x1':'28x28']
y_test = test.loc[:, 'label']

x_train = x_train.values
y_train = y_train.values

x_test = x_test.values
y_test = y_test.values

nb_classes = 10
targets = y_train.reshape(-1)
y_train_onehot = np.eye(nb_classes)[targets]

nb_classes = 10
targets = y_test.reshape(-1)
y_test_onehot = np.eye(nb_classes)[targets]

model = tf.keras.Sequential()
model.add(layers.Dense(784, input_shape=(784,), kernel_initializer='random_uniform', bias_initializer='zeros'))
model.add(layers.Dense(h1, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_), kernel_initializer='random_uniform', bias_initializer='zeros'))
model.add(layers.Dense(10, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2(lambda_), kernel_initializer='random_uniform', bias_initializer='zeros'))

model.compile(optimizer='SGD',
             loss = 'mse',
             metrics = ['categorical_accuracy'])

model.fit(x_train, y_train_onehot, epochs=epochs, batch_size=batch_size)

model.evaluate(x_test, y_test_onehot, batch_size=batch_size)

prediction = model.predict_classes(x_test)
print(prediction)

print(y_test[1:])

我听说很多时候人们遇到这个问题，只是数据输入的问题。但我在这里看不出有任何问题，因为它几乎总是预测错误（如果它是随机的，那么你会预料到）。我该如何解决这个问题？

编辑：具体结果如下：

最后一个训练步骤：

Epoch 15/15
49999/49999 [==============================] - 3s 70us/sample - loss: 0.0309 - categorical_accuracy: 0.9615

评估输出：

2000/2000 [==============================] - 0s 54us/sample - loss: 0.0352 - categorical_accuracy: 0.9310
[0.03524150168523192, 0.931]

来自model.predict_classes的输出：

[9 9 0 ... 5 0 5]

打印输出(y_test):

[9 0 0 7 6 8 5 1 3 2 4 1 4 5 8 4 9 2 4]

Answer 1

首先，你的损失函数是错误的：你处于多重class class化设置中，你使用的损失函数适合回归而不是class化（MSE）。

将我们的模型编译更改为：

model.compile(loss='categorical_crossentropy',
              optimizer='SGD',
              metrics=['accuracy'])

有关详细信息，请参阅 Keras MNIST MLP example for corroboration, and own answer in （尽管此处您实际上遇到了逆向问题，即 class化设置中的回归损失）。

此外，不清楚您使用的MNIST变体是否已经规范化；如果不是，你应该自己规范化它们：

x_train = x_train.values/255
x_test = x_test.values/255

也不清楚为什么你要求一个 784 单元层，因为这实际上是你的 NN 的 second 层（第一层由 input_shape 参数 - 参见 )，它当然不需要为 784 个输入特征中的每一个包含一个单元。

更新（评论后）：

But why is MSE meaningless for classification?

这是一个理论问题，并不完全适合SO；粗略地说，出于同样的原因，我们不使用线性回归进行 classification - 我们使用 logistic 回归，两种方法之间的实际差异正是损失功能。 Andrew Ng 在他在 Coursera 上广受欢迎的机器学习课程中很好地解释了这一点 - 请参阅 Hastie、Tibshirani 和同事的 Lecture 6.1 - Logistic Regression | Classification at Youtube (explanation starts at ~ 3:00), as well as section 4.2 Why Not Linear Regression [for classification]? of the (highly recommended and freely available) textbook An Introduction to Statistical Learning。

And MSE does give a high accuracy, so why doesn't that matter?

如今，几乎 任何东西 你扔给 MNIST 都会 "work"，这当然既不正确也不是用于要求更高的数据集的好方法...

更新 2:

whenever I run with crossentropy, the accuracy just flutters around at ~10%

抱歉，无法重现该行为...使用模型的简化版本 Keras MNIST MLP example，即：

model = Sequential()
model.add(Dense(784, activation='linear', input_shape=(784,)))
model.add(Dense(50, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer=SGD(),
              metrics=['accuracy'])

我们很容易在仅 5 个时期后得到 ~ 92% 验证准确率：

history = model.fit(x_train, y_train,
                    batch_size=128,
                    epochs=5,
                    verbose=1,
                    validation_data=(x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 4s - loss: 0.8974 - acc: 0.7801 - val_loss: 0.4650 - val_acc: 0.8823
Epoch 2/10
60000/60000 [==============================] - 4s - loss: 0.4236 - acc: 0.8868 - val_loss: 0.3582 - val_acc: 0.9034
Epoch 3/10
60000/60000 [==============================] - 4s - loss: 0.3572 - acc: 0.9009 - val_loss: 0.3228 - val_acc: 0.9099
Epoch 4/10
60000/60000 [==============================] - 4s - loss: 0.3263 - acc: 0.9082 - val_loss: 0.3024 - val_acc: 0.9156
Epoch 5/10
60000/60000 [==============================] - 4s - loss: 0.3061 - acc: 0.9132 - val_loss: 0.2845 - val_acc: 0.9196

注意第一个 Dense 层的 activation='linear'，相当于 not specifying anything，就像你的情况一样（正如我所说，实际上你扔给 MNIST 的所有东西都会 "work" )...

最终建议：尝试将您的模型修改为：

model = tf.keras.Sequential()
model.add(layers.Dense(784, activation = 'relu',input_shape=(784,)))
model.add(layers.Dense(h1, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

为了使用更好的（和 default）'glorot_uniform' 初始化器，并删除 kernel_regularizer 参数（它们可能是任何问题的原因 - 总是从简单开始！）...

tf.keras 预测不好，评价不错

tf.keras predictions are bad while evaluation is good

python

machine-learning

neural-network

mnist

tf.keras