TF 准确度得分和混淆矩阵不一致。 TensorFlow 是否在每次访问 BatchDataset 时对数据进行混洗？

Question

model.evaluate() 报告的准确性与从 Sklearn 或 TF 混淆矩阵计算的准确性有很大不同。

from sklearn.metrics import confusion_matrix
...

training_data, validation_data, testing_data = load_img_datasets()
# These ^ are tensorflow.python.data.ops.dataset_ops.BatchDataset

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    model = create_model(INPUT_SHAPE, NUM_CATEGORIES)
    optimizer = tf.keras.optimizers.Adam()
    metrics = ['accuracy']
    model.compile(loss='categorical_crossentropy',
                  optimizer=optimizer,
                  metrics=metrics)

history = model.fit(training_data, epochs=epochs,
                    validation_data=validation_data)

testing_data.shuffle(len(testing_data), reshuffle_each_iteration=False)
# I think this ^ is preventing additional shuffles on access

loss, accuracy = model.evaluate(testing_data)
print(f"Accuracy: {(accuracy * 100):.2f}%")
# Prints 
# Accuracy: 78.7%

y_hat = model.predict(testing_data)
y_test = np.concatenate([y for x, y in testing_data], axis=0)
c_matrix = confusion_matrix(np.argmax(y_test, axis=-1),
                            np.argmax(y_hat, axis=-1))
print(c_matrix)
# Prints result that does not agree:
# Confusion matrix:
#[[ 72 111  54  15  69]
# [ 82 100  44  16  78]
# [ 64 114  52  21  69]
# [ 71 106  54  21  68]
# [ 79 101  51  25  64]]
# Accuracy calculated from CM = 19.3%

起初，我以为 TensorFlow 在每次访问时都在改组 testing_data，所以我添加了 testing_data.shuffle(len(testing_data), reshuffle_each_iteration=False)，但结果仍然不一致。

也试过TF混淆矩阵：

y_hat = model.predict(testing_data)
y_test = np.concatenate([y for x, y in testing_data], axis=0)
true_class = tf.argmax(y_test, 1)
predicted_class = tf.argmax(y_hat, 1)
cm = tf.math.confusion_matrix(true_class, predicted_class, NUM_CATEGORIES)
print(cm)

...结果相似。

显然预测的标签必须与正确的标签进行比较。我做错了什么？

Answer 1

我找不到来源，但似乎 Tensorflow 仍在幕后改组测试。您可以尝试遍历数据集以获得预测和真实类:

predicted_classes = np.array([])
true_classes =  np.array([])

for x, y in testing_data:
  predicted_classes = np.concatenate([predicted_classes,
                       np.argmax(model(x), axis = -1)])
  true_classes = np.concatenate([true_classes, np.argmax(y.numpy(), axis=-1)])

model(x) 是为了更快的执行。 From the source:

Computation is done in batches. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using __call__ is recommended for faster execution, e.g., model(x)

如果不行，你可以试试model.predict(x)。

TF 准确度得分和混淆矩阵不一致。 TensorFlow 是否在每次访问 BatchDataset 时对数据进行混洗？

TF accuracy score and confusion matrix disagree. Is TensorFlow shuffling data on each access of BatchDataset?

scikit-learn

tensorflow

tensorflow-datasets

tensorflow2.0