在没有sklearn的情况下从数据构建混淆矩阵
Constructing a confusion matrix from data without sklearn
我想在不使用 sklearn 库的情况下构建混淆矩阵。我无法正确形成混淆矩阵。这是我的代码:
def comp_confmat():
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
cm = []
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
for c1 in range(1,classes+1):#for every true class
counts = []
for c2 in range(1,classes+1):#for every predicted class
count = 0
for p in range(len(currentDataClass)):
if currentDataClass[p] == predictedClass[p]:
count += 1
counts.append(count)
cm.append(counts)
print(np.reshape(cm,(classes,classes)))
然而这个returns:
[[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]]
但我不明白为什么每次重置计数时每次迭代都会产生 7,并且循环遍历不同的值?
这是我应该得到的(使用 sklearn 的 confusion_matrix 函数):
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]
您可以通过计算每个实际和预测组合中的实例数来导出混淆矩阵 类,如下所示:
import numpy as np
def comp_confmat(actual, predicted):
# extract the different classes
classes = np.unique(actual)
# initialize the confusion matrix
confmat = np.zeros((len(classes), len(classes)))
# loop across the different combinations of actual / predicted classes
for i in range(len(classes)):
for j in range(len(classes)):
# count the number of instances in each combination of actual / predicted classes
confmat[i, j] = np.sum((actual == classes[i]) & (predicted == classes[j]))
return confmat
# sample data
actual = [1, 3, 3, 2, 5, 5, 3, 2, 1, 4, 3, 2, 1, 1, 2]
predicted = [1, 2, 3, 4, 2, 3, 3, 2, 1, 2, 3, 1, 5, 1, 1]
# confusion matrix
print(comp_confmat(actual, predicted))
# [[3. 0. 0. 0. 1.]
# [2. 1. 0. 1. 0.]
# [0. 1. 3. 0. 0.]
# [0. 1. 0. 0. 0.]
# [0. 1. 1. 0. 0.]]
在你的最内层循环中,应该有一个大小写区别:目前这个循环计算协议,但你只需要如果实际上 c1 == c2
。
这是另一种方式,使用嵌套列表理解:
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
counts = [[sum([(currentDataClass[i] == true_class) and (predictedClass[i] == pred_class)
for i in range(len(currentDataClass))])
for pred_class in range(1, classes + 1)]
for true_class in range(1, classes + 1)]
counts
[[3, 0, 0, 0, 1],
[2, 1, 0, 1, 0],
[0, 1, 3, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0]]
这是我使用 numpy 和 pandas 的解决方案:
import numpy as np
import pandas as pd
true_classes = [1, 3, 3, 2, 5, 5, 3, 2, 1, 4, 3, 2, 1, 1, 2]
predicted_classes = [1, 2, 3, 4, 2, 3, 3, 2, 1, 2, 3, 1, 5, 1, 1]
classes = set(true_classes)
number_of_classes = len(classes)
conf_matrix = pd.DataFrame(
np.zeros((number_of_classes, number_of_classes),dtype=int),
index=classes,
columns=classes)
for true_label, prediction in zip(true_classes ,predicted_classes):
# Each pair of (true_label, prediction) is a position in the confusion matrix (row, column)
# Basically here we are counting how many times we have each pair.
# The counting will be placed at the matrix index (true_label/row, prediction/column)
conf_matrix.loc[true_label, prediction] += 1
print(conf_matrix.values)
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]
我想在不使用 sklearn 库的情况下构建混淆矩阵。我无法正确形成混淆矩阵。这是我的代码:
def comp_confmat():
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
cm = []
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
for c1 in range(1,classes+1):#for every true class
counts = []
for c2 in range(1,classes+1):#for every predicted class
count = 0
for p in range(len(currentDataClass)):
if currentDataClass[p] == predictedClass[p]:
count += 1
counts.append(count)
cm.append(counts)
print(np.reshape(cm,(classes,classes)))
然而这个returns:
[[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]]
但我不明白为什么每次重置计数时每次迭代都会产生 7,并且循环遍历不同的值?
这是我应该得到的(使用 sklearn 的 confusion_matrix 函数):
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]
您可以通过计算每个实际和预测组合中的实例数来导出混淆矩阵 类,如下所示:
import numpy as np
def comp_confmat(actual, predicted):
# extract the different classes
classes = np.unique(actual)
# initialize the confusion matrix
confmat = np.zeros((len(classes), len(classes)))
# loop across the different combinations of actual / predicted classes
for i in range(len(classes)):
for j in range(len(classes)):
# count the number of instances in each combination of actual / predicted classes
confmat[i, j] = np.sum((actual == classes[i]) & (predicted == classes[j]))
return confmat
# sample data
actual = [1, 3, 3, 2, 5, 5, 3, 2, 1, 4, 3, 2, 1, 1, 2]
predicted = [1, 2, 3, 4, 2, 3, 3, 2, 1, 2, 3, 1, 5, 1, 1]
# confusion matrix
print(comp_confmat(actual, predicted))
# [[3. 0. 0. 0. 1.]
# [2. 1. 0. 1. 0.]
# [0. 1. 3. 0. 0.]
# [0. 1. 0. 0. 0.]
# [0. 1. 1. 0. 0.]]
在你的最内层循环中,应该有一个大小写区别:目前这个循环计算协议,但你只需要如果实际上 c1 == c2
。
这是另一种方式,使用嵌套列表理解:
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
counts = [[sum([(currentDataClass[i] == true_class) and (predictedClass[i] == pred_class)
for i in range(len(currentDataClass))])
for pred_class in range(1, classes + 1)]
for true_class in range(1, classes + 1)]
counts
[[3, 0, 0, 0, 1],
[2, 1, 0, 1, 0],
[0, 1, 3, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0]]
这是我使用 numpy 和 pandas 的解决方案:
import numpy as np
import pandas as pd
true_classes = [1, 3, 3, 2, 5, 5, 3, 2, 1, 4, 3, 2, 1, 1, 2]
predicted_classes = [1, 2, 3, 4, 2, 3, 3, 2, 1, 2, 3, 1, 5, 1, 1]
classes = set(true_classes)
number_of_classes = len(classes)
conf_matrix = pd.DataFrame(
np.zeros((number_of_classes, number_of_classes),dtype=int),
index=classes,
columns=classes)
for true_label, prediction in zip(true_classes ,predicted_classes):
# Each pair of (true_label, prediction) is a position in the confusion matrix (row, column)
# Basically here we are counting how many times we have each pair.
# The counting will be placed at the matrix index (true_label/row, prediction/column)
conf_matrix.loc[true_label, prediction] += 1
print(conf_matrix.values)
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]