计算两个文本文件的混淆矩阵

Question

我想计算两个文本文件的混淆矩阵。有谁知道 python 或 shell 脚本中可以执行此操作的库或工具？

例如我有两个文件

文件 A：

文件 B：

在哪里我会得到一个混淆矩阵：

   1   2
--------
1| 0   2
2| 0   2

更新：我想指出原来的post包括行和列标签

Answer 1

这可能有点矫枉过正，但 scikit-learn 会很容易地做到这一点：

from sklearn.metrics import confusion_matrix

# Read the data
with open('file1', 'r') as infile:
    true_values = [int(i) for i in infile]
with open('file2', 'r') as infile:
    predictions = [int(i) for i in infile]

# Make confusion matrix
confusion = confusion_matrix(true_values, predictions)

print(confusion)

有输出

[[0 2]
 [0 2]]

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

更新：要使用标签打印，您可以使用 pandas 或类似的方式转换为数据框：

def print_confusion(confusion):
    print('   ' + '  '.join([str(n) for n in range(confusion.shape[1])]))
    for rownum in range(confusion.shape[0]):
        print(str(rownum) + '  ' + '  '.join([str(n) for n in confusion[rownum]]))

打印

   0  1
0  0  2
1  0  2

计算两个文本文件的混淆矩阵

Calculating Confusion Matrix for Two Text Files

python

bash

shell

confusion-matrix