计算熊猫框架中不同列的混淆矩阵?
Calculate the Confusion Matrix in different columns in panda frame?
我有一个包含 3000 行和 3 列的数据框,如下所示:
0 col1 col2 col3
ID1 1 0 1
Id2 1 1 0
Id3 0 1 1
Id4 2 1 0
Id5 2 2 3
… .. .. ..
Id3000 3 1 0
在这个数据框中,每一列和每一行的值都是指一个预测问题的结果,如下所示:0表示TP,1表示FP,2表示TN,3表示每列中的FN。所以我想计算每一列的准确性。像这样:
Accuracy result:
col1 col2 col3
0.67 0.68 0.79
任何关于我可以以非常有效的方式计算重要指标(例如准确性或 f 度量)的想法。
这是一种方法:
data = """
id col1 col2 col3
ID1 1 0 1
Id2 1 1 0
Id3 0 1 1
Id4 2 1 0
Id5 2 2 3
"""
#coding to create a sample DataFrame for testing
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
print(df)
#end of creation
accuracy ={} #dict for result final
# i select all columns with name begins by 'col' and create a list
select_cols = [col for col in df.columns if col.startswith('col')]
for col in select_cols:
df1 = df.groupby(col).size()
t = [0,0,0,0] #[TP, FP, TN, FN] 0 = TP, 1 = FP, 2 = TN and 3 = FN
for v in df1.index:
t[v] = df1[v]
accuracy[col] = (t[0] + t[2])/(sum(t)) #Accuracy = (TP + TN)/(TP +TN + FP + FN
df_acc = pd.DataFrame.from_dict(accuracy, orient='index').T
print('Accuracy:');print(df_acc)
输出:
Accuracy:
col1 col2 col3
0 0.6 0.4 0.4
或另一种解决方案(我认为更好):替换 2 个循环 for
for col in select_cols:
accuracy[col] = (df[df[col]==0].count()[0] + df[df[col]==2].count()[0]) / df[col].count()
df_acc = pd.DataFrame.from_dict(accuracy, orient='index' ).T.reset_index(drop=True)
print('Accuracy');print(df_acc)
我有一个包含 3000 行和 3 列的数据框,如下所示:
0 col1 col2 col3
ID1 1 0 1
Id2 1 1 0
Id3 0 1 1
Id4 2 1 0
Id5 2 2 3
… .. .. ..
Id3000 3 1 0
在这个数据框中,每一列和每一行的值都是指一个预测问题的结果,如下所示:0表示TP,1表示FP,2表示TN,3表示每列中的FN。所以我想计算每一列的准确性。像这样:
Accuracy result:
col1 col2 col3
0.67 0.68 0.79
任何关于我可以以非常有效的方式计算重要指标(例如准确性或 f 度量)的想法。
这是一种方法:
data = """
id col1 col2 col3
ID1 1 0 1
Id2 1 1 0
Id3 0 1 1
Id4 2 1 0
Id5 2 2 3
"""
#coding to create a sample DataFrame for testing
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
print(df)
#end of creation
accuracy ={} #dict for result final
# i select all columns with name begins by 'col' and create a list
select_cols = [col for col in df.columns if col.startswith('col')]
for col in select_cols:
df1 = df.groupby(col).size()
t = [0,0,0,0] #[TP, FP, TN, FN] 0 = TP, 1 = FP, 2 = TN and 3 = FN
for v in df1.index:
t[v] = df1[v]
accuracy[col] = (t[0] + t[2])/(sum(t)) #Accuracy = (TP + TN)/(TP +TN + FP + FN
df_acc = pd.DataFrame.from_dict(accuracy, orient='index').T
print('Accuracy:');print(df_acc)
输出:
Accuracy:
col1 col2 col3
0 0.6 0.4 0.4
或另一种解决方案(我认为更好):替换 2 个循环 for
for col in select_cols:
accuracy[col] = (df[df[col]==0].count()[0] + df[df[col]==2].count()[0]) / df[col].count()
df_acc = pd.DataFrame.from_dict(accuracy, orient='index' ).T.reset_index(drop=True)
print('Accuracy');print(df_acc)