使用 sklearn 计算二进制文件数组和浮点分数数组的 roc 曲线时出现问题
Problem while calculating roc curve using sklearn for an array of binaries and an array of float scores
我正在尝试为一组像这样的预测计算 roc 曲线
fpr, tpr, thresholds = roc_curve(y_test, probas)
这里是 y_test 数组
array([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11,
-9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9,
-8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74,
-8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5,
-8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26,
-8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94,
-7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71,
-7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27,
-7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83,
-6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16,
-6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92], dtype=object)
这是 probas 数组
array([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=object)
现在当我尝试 运行
fpr, tpr, thresholds = roc_curve(y_test, probas)
我得到一个 ValueError
--> 318 raise ValueError("{0} format is not supported".format(y_type))
319
320 check_consistent_length(y_true, y_score, sample_weight)
ValueError: continuous format is not supported
我该如何解决这个问题?
看起来你调换了目标分数和二进制标签。我必须从您的数组中删除 dtype=object
才能使其正常工作。以下是工作解决方案。根据官方页面 here,roc_curve
的第一个参数是 {0,1} 范围内的二进制标签,第二个参数是目标分数。您将 probab
作为目标分数传递,将 y_test
作为二进制标签传递。
from sklearn.metrics import roc_curve
y_test = np.asarray([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11, -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9, -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74, -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5, -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26, -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94, -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71, -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27, -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83, -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16, -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92])
probas = np.asarray([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
fpr, tpr, thresholds = roc_curve(probas,y_test)
plt.plot(fpr, label = 'fpr')
plt.plot(tpr, label = 'tpr')
plt.legend(fontsize=16)
输出
我正在尝试为一组像这样的预测计算 roc 曲线
fpr, tpr, thresholds = roc_curve(y_test, probas)
这里是 y_test 数组
array([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11, -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9, -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74, -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5, -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26, -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94, -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71, -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27, -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83, -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16, -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92], dtype=object)
这是 probas 数组
array([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=object)
现在当我尝试 运行
fpr, tpr, thresholds = roc_curve(y_test, probas)
我得到一个 ValueError
--> 318 raise ValueError("{0} format is not supported".format(y_type)) 319 320 check_consistent_length(y_true, y_score, sample_weight)
ValueError: continuous format is not supported
我该如何解决这个问题?
看起来你调换了目标分数和二进制标签。我必须从您的数组中删除 dtype=object
才能使其正常工作。以下是工作解决方案。根据官方页面 here,roc_curve
的第一个参数是 {0,1} 范围内的二进制标签,第二个参数是目标分数。您将 probab
作为目标分数传递,将 y_test
作为二进制标签传递。
from sklearn.metrics import roc_curve
y_test = np.asarray([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11, -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9, -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74, -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5, -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26, -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94, -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71, -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27, -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83, -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16, -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92])
probas = np.asarray([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
fpr, tpr, thresholds = roc_curve(probas,y_test)
plt.plot(fpr, label = 'fpr')
plt.plot(tpr, label = 'tpr')
plt.legend(fontsize=16)
输出