如何将混淆矩阵转换为数据框?
How to convert confusion matrix to dataframe?
我想知道如何将混淆矩阵从 scikit learn 转换为数据框。
不知道是否可以把不同机型的mc全部混用。为什么我问是因为可读性。我必须始终在终端中打印并将 mc 复制到 excel 文件中,这确实要求很高,因为我根据所选参数多次 运行 脚本。
models = {'Model_SVC': model1, 'Model_G_NB': model2, 'Model_LR': model3, 'Model_RF': model4, 'Model_KN': model5, 'Model_MLP': model6}
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
for model_name, model in models.items():
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter)
print("Model: {}".format(model_name))
print("Accuracy: {}".format(accuracy_score(ylabels, y_pred)))
cm = confusion_matrix(ylabels, y_pred)
output = pd.DataFrame()
print("matrice confusion: {}".format(cm), file=f)
矩阵看起来像这样:
Model: Model_SVC
Accuracy: 0.5692307692307692
matrice confusion: [[ 34 4 46]
[ 10 2 33]
[ 16 3 112]]
Model: Model_G_NB
Accuracy: 0.43846153846153846
matrice confusion: [[31 22 31]
[10 13 22]
[27 34 70]]
Model: Model_LR
Accuracy: 0.5461538461538461
matrice confusion: [[ 30 4 50]
[ 11 0 34]
[ 16 3 112]]
Model: Model_RF
Accuracy: 0.5846153846153846
matrice confusion: [[ 40 5 39]
[ 17 1 27]
[ 20 0 111]]
Model: Model_KN
Accuracy: 0.4846153846153846
matrice confusion: [[33 10 41]
[14 12 19]
[41 9 81]]
Model: Model_MLP
Accuracy: 0.5153846153846153
matrice confusion: [[ 17 0 67]
[ 12 0 33]
[ 13 1 117]]
我想要这样的东西:
F C M
0 34 4 46
1 10 2 33
2 16 3 112
3 31 22 31 => second cm
4 10 13 22
5 27 34 70
6 30 4 50 => third cm
7 11 0 34
8 16 3 112
...
因为我使用的是“for”,所以我希望 cm 相互跟随,以便最后我能够将数据导出到一个 excel 或 csv 文件中。一个数据框,可以将所有cm打印一个接一个地组合起来。
将任何 2D 矩阵(混淆与否)转换为 pandas 数据帧非常简单:
from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
cm = confusion_matrix(y_true, y_pred)
print(cm)
# result:
[[2 0 0]
[0 0 1]
[1 0 2]]
import pandas as pd
df = pd.DataFrame(cm)
print(df)
# result:
0 1 2
0 2 0 0
1 0 0 1
2 1 0 2
完整,包含行名和列名。
合并数据帧也很简单:
cm2 = [[1, 0, 0],
[0, 0, 1],
[2, 0, 1]]
df2 = pd.DataFrame(cm2)
cm3 = [[0, 0, 2],
[1, 2, 1],
[2, 0, 0]]
df3 = pd.DataFrame(cm3)
frames = [df, df2, df3]
final = pd.concat(frames)
print(final)
# result:
0 1 2
0 2 0 0
1 0 0 1
2 1 0 2
0 1 0 0
1 0 0 1
2 2 0 1
0 0 0 2
1 1 2 1
2 2 0 0
如果你在循环中使用它,你总是可以从一个空列表 frames=[]
开始,对每个新数据帧使用 frames.append(df)
,并使用 pd.concat(frames)
获得最终帧:
frames = []
for model_name, model in models.items():
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter)
cm = confusion_matrix(y_true, y_pred)
df = pd.DataFrame(cm)
frames.append(df)
final = pd.concat(frames)
存储在列表中,然后使用 np.vstack()
:
import numpy as np
all_cm = list()
for model_name, model in models.items():
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter)
print("Model: {}".format(model_name))
print("Accuracy: {}".format(accuracy_score(ylabels, y_pred)))
cm = confusion_matrix(ylabels, y_pred)
all_cm.append(cm)
final_matrix = np.vstack(all_cm)
print(final_matrix)
人工数据示例:
import numpy as np
np.random.seed(0)
all_cm = list()
for i in range(3):
all_cm.append(np.random.rand(3,3))
final_matrix = np.vstack(all_cm)
print(final_matrix)
[[0.5488135 0.71518937 0.60276338]
[0.54488318 0.4236548 0.64589411]
[0.43758721 0.891773 0.96366276]
[0.38344152 0.79172504 0.52889492]
[0.56804456 0.92559664 0.07103606]
[0.0871293 0.0202184 0.83261985]
[0.77815675 0.87001215 0.97861834]
[0.79915856 0.46147936 0.78052918]
[0.11827443 0.63992102 0.14335329]]
我想知道如何将混淆矩阵从 scikit learn 转换为数据框。
不知道是否可以把不同机型的mc全部混用。为什么我问是因为可读性。我必须始终在终端中打印并将 mc 复制到 excel 文件中,这确实要求很高,因为我根据所选参数多次 运行 脚本。
models = {'Model_SVC': model1, 'Model_G_NB': model2, 'Model_LR': model3, 'Model_RF': model4, 'Model_KN': model5, 'Model_MLP': model6}
cv_splitter = KFold(n_splits=10, shuffle=False, random_state=None)
for model_name, model in models.items():
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter)
print("Model: {}".format(model_name))
print("Accuracy: {}".format(accuracy_score(ylabels, y_pred)))
cm = confusion_matrix(ylabels, y_pred)
output = pd.DataFrame()
print("matrice confusion: {}".format(cm), file=f)
矩阵看起来像这样:
Model: Model_SVC
Accuracy: 0.5692307692307692
matrice confusion: [[ 34 4 46]
[ 10 2 33]
[ 16 3 112]]
Model: Model_G_NB
Accuracy: 0.43846153846153846
matrice confusion: [[31 22 31]
[10 13 22]
[27 34 70]]
Model: Model_LR
Accuracy: 0.5461538461538461
matrice confusion: [[ 30 4 50]
[ 11 0 34]
[ 16 3 112]]
Model: Model_RF
Accuracy: 0.5846153846153846
matrice confusion: [[ 40 5 39]
[ 17 1 27]
[ 20 0 111]]
Model: Model_KN
Accuracy: 0.4846153846153846
matrice confusion: [[33 10 41]
[14 12 19]
[41 9 81]]
Model: Model_MLP
Accuracy: 0.5153846153846153
matrice confusion: [[ 17 0 67]
[ 12 0 33]
[ 13 1 117]]
我想要这样的东西:
F C M
0 34 4 46
1 10 2 33
2 16 3 112
3 31 22 31 => second cm
4 10 13 22
5 27 34 70
6 30 4 50 => third cm
7 11 0 34
8 16 3 112
...
因为我使用的是“for”,所以我希望 cm 相互跟随,以便最后我能够将数据导出到一个 excel 或 csv 文件中。一个数据框,可以将所有cm打印一个接一个地组合起来。
将任何 2D 矩阵(混淆与否)转换为 pandas 数据帧非常简单:
from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
cm = confusion_matrix(y_true, y_pred)
print(cm)
# result:
[[2 0 0]
[0 0 1]
[1 0 2]]
import pandas as pd
df = pd.DataFrame(cm)
print(df)
# result:
0 1 2
0 2 0 0
1 0 0 1
2 1 0 2
完整,包含行名和列名。
合并数据帧也很简单:
cm2 = [[1, 0, 0],
[0, 0, 1],
[2, 0, 1]]
df2 = pd.DataFrame(cm2)
cm3 = [[0, 0, 2],
[1, 2, 1],
[2, 0, 0]]
df3 = pd.DataFrame(cm3)
frames = [df, df2, df3]
final = pd.concat(frames)
print(final)
# result:
0 1 2
0 2 0 0
1 0 0 1
2 1 0 2
0 1 0 0
1 0 0 1
2 2 0 1
0 0 0 2
1 1 2 1
2 2 0 0
如果你在循环中使用它,你总是可以从一个空列表 frames=[]
开始,对每个新数据帧使用 frames.append(df)
,并使用 pd.concat(frames)
获得最终帧:
frames = []
for model_name, model in models.items():
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter)
cm = confusion_matrix(y_true, y_pred)
df = pd.DataFrame(cm)
frames.append(df)
final = pd.concat(frames)
存储在列表中,然后使用 np.vstack()
:
import numpy as np
all_cm = list()
for model_name, model in models.items():
y_pred = cross_val_predict(model, features, ylabels, cv=cv_splitter)
print("Model: {}".format(model_name))
print("Accuracy: {}".format(accuracy_score(ylabels, y_pred)))
cm = confusion_matrix(ylabels, y_pred)
all_cm.append(cm)
final_matrix = np.vstack(all_cm)
print(final_matrix)
人工数据示例:
import numpy as np
np.random.seed(0)
all_cm = list()
for i in range(3):
all_cm.append(np.random.rand(3,3))
final_matrix = np.vstack(all_cm)
print(final_matrix)
[[0.5488135 0.71518937 0.60276338]
[0.54488318 0.4236548 0.64589411]
[0.43758721 0.891773 0.96366276]
[0.38344152 0.79172504 0.52889492]
[0.56804456 0.92559664 0.07103606]
[0.0871293 0.0202184 0.83261985]
[0.77815675 0.87001215 0.97861834]
[0.79915856 0.46147936 0.78052918]
[0.11827443 0.63992102 0.14335329]]