sklearn jaccard_score 是如何计算的？

Question

我试图了解 sklearn 的 jaccard_score 发生了什么。

这是我得到的结果

1. jaccard_score([0 1 1], [1 1 1])
0.6666666666666666

2. jaccard_score([1 1 0], [1 0 0])
0.5

3. jaccard_score([1 1 0], [1 0 1])
0.3333333333333333

我明白公式是

intersection / size of A + size of B - intersection

我认为最后一个应该给我 0.2，因为重叠为 1，条目总数为 6，结果为 1/5。但我得到了 0.33333...

谁能解释一下 sklearn 如何计算 jaccard_score？

Answer 1

Per sklearn's doc, the jaccard_score function "is used to compare set of predicted labels for a sample to the corresponding set of labels in y_true". If the attributes are binary, the computation is based on this 使用混淆矩阵。否则，使用每个属性值/class标签的混淆矩阵完成相同的计算。

以上二进制属性的定义/classes 可以简化为集合定义，如下所述。

假设有3条记录r1、r2、r3。向量 [0, 1, 1] 和 [1, 1, 1] —— 记录的真实和预测 classes —— 可以分别映射到两个集合 {r2, r3} 和 {r1, r2, r3}。在这里，向量中的每个元素表示集合中是否存在相应的记录。两组的Jaccard相似度与两个向量相似度值的定义相同

sklearn jaccard_score 是如何计算的？

how does sklearn jaccard_score gets calculated?

python

similarity

information-theory

scikit-learn