Cohen Kappa 得分正确吗?

Is that Cohen Kappa score correct?

当只有 2% 的标签不一致时 cohen_kappa_score 输出 0.0 是否正确?

from sklearn.metrics import cohen_kappa_score
y1 = 100 * [1]
y2 = 100 * [1]

y2[0]=0
y2[1]=0

cohen_kappa_score(y1,y2)
#0.0

还是我漏掉了什么?

计算正确。这是使用这种协议度量的一个不幸的缺点。如果至少一个 class 预测器在 100% 的时间内预测了一个 class,则结果将始终为零。如果你有几分钟时间,我鼓励你尝试根据 example on Wikipedia.

自己计算

正如this paper's摘要所说,

A limitation of kappa is that it is affected by the prevalence of the finding under observation.

full text用一个例子更全面地描述了问题,并得出结论

...kappa may not be reliable for rare observations. Kappa is affected by prevalence of the finding under consideration much like predictive values are affected by the prevalence of the disease under consideration. For rare findings, very low values of kappa may not necessarily reflect low rates of overall agreement.

另一个有用的参考是 Interrater reliability: the kappa statistic,它提倡同时使用百分比一致性和 Cohen 的 kappa 来更全面地描述一致性。