为什么 jaccard_score 和 jaccard_similarity_score 的输出不同?

Why are the outputs of jaccard_score and jaccard_similarity_score different?

当尝试使用 jaccard_similarity_score 我得到 "Deprecation Warning: jaccard_similarity_score has been deprecated and replaced with jaccard_score. It will be removed in version 0.23. This implementation has surprising behavior for binary and multiclass classification tasks."

Jaccard Similarity Score 的经典解释与已弃用的 jaccard_similarity_score 的输出相匹配。

然而,jaccard_score 和 jaccard_similarity_score 的结果是不同的(即使尝试不同的参数,如图所示)。

from sklearn.metrics import jaccard_similarity_score, jaccard_score  
y_pred = [0,1,0,1,0,0,0,1,0,1]  
y_true = [0,0,0,1,0,1,0,1,0,0] 
print("jaccard_similarity_score=",jaccard_similarity_score(y_true, y_pred),'\n')  
for param in ['weighted', 'micro', 'macro']:  
    print(param, " jaccard_score=", jaccard_score(y_true, y_pred,  average=param))    

这是上面代码的输出:

jaccard_similarity_score= 0.7 

weighted  jaccard_score= 0.5575  
micro  jaccard_score= 0.5384615384615384  
macro  jaccard_score= 0.5125 

是否有一个选项可以使结果相等?新的 jaccard_score 是否按预期工作?

你可以从 https://github.com/scikit-learn/scikit-learn/blob/a5d4c61/sklearn/metrics/classification.py#L311

中看到实现

jaccard_similarity_score实际计算精度。

所以,实际上,jaccard_similarity_score 在这里不是一个好的函数。

from sklearn.metrics import jaccard_similarity_score 需要替换为 from sklearn.metrics import jaccard_score 并且需要新参数 pos_label,例如- jaccard_score(y_test, dt_yhat,pos_label = "PAIDOFF")pos_label 的有效标签是:array(['COLLECTION', 'PAIDOFF'], dtype='<U10')

https://github.com/DiamondLightSource/SuRVoS/issues/103