交叉验证分数为 0

Question

我是数据分析的新手，如果这是一个新手问题，请原谅。我是运行对相同数据的 PLS 回归，其中 X 由序数变量组成，y 是一个二进制变量，指示事件是否发生。我生成了一些交叉验证分数并得到以下结果：

X = threat.iloc[:,2:96]
y = threat.iloc[:,1]

pls1 = PLSRegression(n_components=10)
result = pls1.fit_transform(X, y)

scoresT = cross_val_score(pls1, X, y, cv=5)
print(scoresT)

[ 0.          0.          0.          0.55965802  0.        ]

我知道每个数字代表每个 "fold" 的分数，但我希望得到一系列数字，例如 [0.2, 0.4, 0.6, 0.7, 0.3] 而不是 [0, 0, 0, 0.5] , 0] 所以我不确定这对我的数据或模型到底是什么意思。

有没有人有任何见解？

Answer 1

当cross_val_score中的"scoring"参数未指定时，它returns估计器的默认评分方法。对于 PLSRegression（就像 sklearn 中的所有回归模型一样），score method

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

您得到的交叉验证分数没有好于恒定模型的原因可能是您正在使用回归模型来解决分类问题。尝试使用分类模型。

交叉验证分数为 0

Cross-validation score of 0

python

scikit-learn

cross-validation