绘制所有样本的混淆矩阵

Question

您好，我正在尝试为我用于测试的所有样本绘制一个混淆矩阵。但是，由于我指定了 batc_size，因此混淆矩阵仅针对指定数量的 batch_size 输出所有正确的类。即，如果我总共有 3000 个样本，而不是预测所有 3000 个样本，如果批量大小指定为 150，则混淆矩阵仅预测 150 个样本。请你帮忙找出我可以做些什么来绘制所有 3000 个样本的混淆矩阵。

    num_classes = 2
    image_resize = 256
    train_dir ='./..../..'#3000 samples
    test_dir = './.../...'#3000 samples
    batch_size_training = 150
    batch_size_validation = 150
    num_epochs = 10
    
    data_generator = ImageDataGenerator(
        preprocessing_function=preprocess_input,validation_split=0.2)
    
    
    train_generator = data_generator.flow_from_directory(
        train_dir,
        target_size=(image_resize, image_resize),
        batch_size=batch_size_training,
        class_mode='categorical')
    validation_generator = data_generator.flow_from_directory(
        train_dir,
        target_size=(image_resize, image_resize),
        batch_size=batch_size_validation,
        class_mode='categorical')
    test_generator = data_generator.flow_from_directory(
        test_dir,
        target_size=(image_resize, image_resize),
        batch_size=batch_size_validation,
        class_mode='categorical')
    
    x_train, y_train = next(train_generator)
    x_val,y_val = next(validation_generator)
    x_test, y_test = next(test_generator)
model.compile(loss='categorical_crossentropy',metrics=['accuracy'])


steps_per_epoch_training = int(np.floor(train_generator.n // batch_size_training ))

steps_per_epoch_validation = int(np.floor(validation_generator.n // batch_size_validation ))


fit_history = model.fit(train_generator,
    steps_per_epoch=steps_per_epoch_training,
    epochs=num_epochs,
    validation_data=validation_generator,
    validation_steps=steps_per_epoch_validation,
    verbose=1,
)
    
probs = model.predict(x_test)
preds = probs.argmax(axis = -1)
accuracy = 100*(np.mean(preds == y_test.argmax(axis=-1)))
y_test = np.argmax(y_test,axis=-1)
        
print("Classification accuracy: %f " % (accuracy))
cm =confusion_matrix(y_test,preds)
print(cm)
df_cm = pd.DataFrame(cm, range(2), range(2))
fig = plt.figure(figsize=(10,7))
sn.set(font_scale=1.4) # for label size
sn.heatmap(df_cm, annot=True, annot_kws={"size": 16}) # font size
fig.savefig('CM.jpg')

Answer 1

你的问题是你实际上只从数据生成器中提取了一个批次：

...
x_train, y_train = next(train_generator)
x_val,y_val = next(validation_generator)
x_test, y_test = next(test_generator)
...

然后您使用该单个测试批次对其进行运行预测：

...
probs = model.predict(x_test)
...

所以代码运行完全符合预期运行对单个批次的预测。

要运行预测来自生成器的所有测试数据，您应该能够简单地执行此操作：

# to get predictions for all test data points
probs = model.predict(test_generator)
# to get labels for all the test data points
y_test = test_generator.labels

绘制所有样本的混淆矩阵

Plot Confusion matrix for all the samples

python

confusion-matrix

deep-learning

tensorflow