为什么 jaccard_score 和 jaccard_similarity_score 的输出不同?
Why are the outputs of jaccard_score and jaccard_similarity_score different?
当尝试使用 jaccard_similarity_score 我得到 "Deprecation Warning: jaccard_similarity_score has been deprecated and replaced with jaccard_score. It will be removed in version 0.23. This implementation has surprising behavior for binary and multiclass classification tasks."
Jaccard Similarity Score 的经典解释与已弃用的 jaccard_similarity_score 的输出相匹配。
然而,jaccard_score 和 jaccard_similarity_score 的结果是不同的(即使尝试不同的参数,如图所示)。
from sklearn.metrics import jaccard_similarity_score, jaccard_score
y_pred = [0,1,0,1,0,0,0,1,0,1]
y_true = [0,0,0,1,0,1,0,1,0,0]
print("jaccard_similarity_score=",jaccard_similarity_score(y_true, y_pred),'\n')
for param in ['weighted', 'micro', 'macro']:
print(param, " jaccard_score=", jaccard_score(y_true, y_pred, average=param))
这是上面代码的输出:
jaccard_similarity_score= 0.7
weighted jaccard_score= 0.5575
micro jaccard_score= 0.5384615384615384
macro jaccard_score= 0.5125
是否有一个选项可以使结果相等?新的 jaccard_score 是否按预期工作?
你可以从 https://github.com/scikit-learn/scikit-learn/blob/a5d4c61/sklearn/metrics/classification.py#L311
中看到实现
jaccard_similarity_score实际计算精度。
所以,实际上,jaccard_similarity_score
在这里不是一个好的函数。
from sklearn.metrics import jaccard_similarity_score
需要替换为 from sklearn.metrics import jaccard_score
并且需要新参数 pos_label
,例如- jaccard_score(y_test, dt_yhat,pos_label = "PAIDOFF")
。 pos_label
的有效标签是:array(['COLLECTION', 'PAIDOFF'], dtype='<U10')
当尝试使用 jaccard_similarity_score 我得到 "Deprecation Warning: jaccard_similarity_score has been deprecated and replaced with jaccard_score. It will be removed in version 0.23. This implementation has surprising behavior for binary and multiclass classification tasks."
Jaccard Similarity Score 的经典解释与已弃用的 jaccard_similarity_score 的输出相匹配。
然而,jaccard_score 和 jaccard_similarity_score 的结果是不同的(即使尝试不同的参数,如图所示)。
from sklearn.metrics import jaccard_similarity_score, jaccard_score
y_pred = [0,1,0,1,0,0,0,1,0,1]
y_true = [0,0,0,1,0,1,0,1,0,0]
print("jaccard_similarity_score=",jaccard_similarity_score(y_true, y_pred),'\n')
for param in ['weighted', 'micro', 'macro']:
print(param, " jaccard_score=", jaccard_score(y_true, y_pred, average=param))
这是上面代码的输出:
jaccard_similarity_score= 0.7
weighted jaccard_score= 0.5575
micro jaccard_score= 0.5384615384615384
macro jaccard_score= 0.5125
是否有一个选项可以使结果相等?新的 jaccard_score 是否按预期工作?
你可以从 https://github.com/scikit-learn/scikit-learn/blob/a5d4c61/sklearn/metrics/classification.py#L311
中看到实现jaccard_similarity_score实际计算精度。
所以,实际上,jaccard_similarity_score
在这里不是一个好的函数。
from sklearn.metrics import jaccard_similarity_score
需要替换为 from sklearn.metrics import jaccard_score
并且需要新参数 pos_label
,例如- jaccard_score(y_test, dt_yhat,pos_label = "PAIDOFF")
。 pos_label
的有效标签是:array(['COLLECTION', 'PAIDOFF'], dtype='<U10')