由连续值组成的线性回归模型的 ROC 曲线

Question

我想制作一个具有连续值的模型。所以，在我拆分数据之前。

X = data[col_list]
y = data['death rate']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)

首先，我用'sklearn.linear_model import LinearRegression'做了模型。

#instantiate the model
lin_regression = LinearRegression()

#fit the model using the training data
lin_regression.fit(X_train,y_train)

#define metrics
y_predicted = lin_regression.predict(X_test)
fpr, tpr, _ = metrics.roc_curve(y_test,  y_predicted)

但是代码不起作用。它说 'ValueError: continuous format is not supported'.

之后我用'from sklearn import svm'来处理

random_state = np.random.RandomState(0)

#instantiate the model
classifier = OneVsRestClassifier(
    svm.SVC(kernel="linear", probability=True, random_state=random_state)
)

#fit the model using the training data
y_score = classifier.fit(X_train, y_train).decision_function(X_test)

但它仍然不适用于 'ValueError:Unknown label type'。我发现来自我引用的站点的原始 y 数据格式是 (n x 3) 数组，它是二进制值。例如，y_train=[[0,1,1],[0,1,0],...].

我的问题是

线性回归模型可以有ROC曲线吗？
如果可以的话，python如何制作？

Answer 1

您无法根据回归模型计算 ROC 曲线，因为您无法定义真阳性、真阴性、假阳性和假阴性。唯一的解决方案可能是定义一个阈值并将 y 变量二值化为：

y_bin = np.zeros_like(y_test)
y_bin[y_test>=threshold] = 1

fpr, tpr, _ = metrics.roc_curve(y_bin,  y_predicted)

否则，您可以应用其他指标： https://www.sciencedirect.com/science/article/pii/S0031320313002665

由连续值组成的线性回归模型的 ROC 曲线

ROC curve from the linear regression model made of continuous values

python

linear-regression

roc

auc