尝试从系列中提取模式时数据框中无处不在的额外列

Extra column from nowhere in dataframe when trying to extract mode from series

finals_preds= pd.concat([clf_preds,clf_pred_probs,ISFOR_clus_preds,SVM_clus_preds,KMEANS_clus_preds,LOCOUT_clus_preds, DBSC_clus_preds],axis=1)
finals_preds.columns=['clf_class','clf_score', 'ISOFOR','SVM-1C','KMEANS','LOCOUT','DBSCAN']
finals_preds

那么这是输出

然后真正的问题来了,当我试图添加另一列来总结系列的模式时,错误说我试图将 2 列塞进 1。

# add a column for all the scrores
finals_preds['ENSEMB']= finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)
finals_preds

错误信息:

ValueError: Wrong number of items passed 2, placement implies 1

然后我查看了右边的代码,把我搞糊涂了:

我也把每个series模式的结果都打印出来了,看起来都很正常:

那么,当我尝试将它们的模式组合在一起时,为什么会有一个额外的列?

mode returns 最常出现的值。你有一个二进制 table 所以你可以有以下三种情况:

     0    1
0  0.0  NaN  # You have more 0 than 1 in the first row
1  1.0  NaN  # You have more 1 than 0 in the second row
2  0.0  1.0  # You have as many 0 as 1 in the third row

除非整个数据帧中每一行的 0 和 1 的数量不相等,否则输出将始终有 2 列。

如果您想要每一行的最具代表性的值,请执行以下操作:

finals_preds['ENSEMB']= \
    finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)[0]
#                                                           HERE ---^^^