Python

Question

我正尝试在 XGBoost 回归问题上使用 kFold。数据样本是这样的：

当我使用以下代码时：

df = pd.read_csv('../data/df_samp.csv').head(1000)
cat_columns = ['primary_use','meter','hour','weekday','month','wind_compass']
df_processed = pd.get_dummies(df, prefix_sep="_", columns=cat_columns)
X=df_processed.drop(['meter_reading','outlier_ratio','meter_reading_roll_avg','timestamp'],axis=1)
y=df_processed['meter_reading']

scores = []
model = XGBClassifier()
cv = KFold(n_splits=10, shuffle=False)

for train_index, test_index in cv.split(X):
    print("Train Index: ", train_index, "\n")
    print("Test Index: ", test_index)
    X_train, X_test, y_train, y_test = X.values[train_index], X.values[test_index], y.values[train_index], y.values[test_index]
    model.fit(X_train,y_train)
    y_pred=model.predict(X_test)
    predictions = [round(value) for value in y_pred]
    scores.append(r2_score(y_test,predictions))

我得到输出

print(scores)
[0.406908684278529, 0.3320925821156784, 0.1039843686445262, 0.395466094618815, 0.13412072574647682, -0.015579242639622182, -0.17008382837529967, 0.3931056789610018, 0.4491969042604125, 0.49641651402527265]

当我尝试时

scores = []
model = XGBClassifier()
cv = KFold(n_splits=10, random_state=42, shuffle=False)
cross_val_score(model, X.values, y.values, cv=10)

我明白了

ValueError: continuous is not supported

有人知道为什么吗？

谢谢

Answer 1

感谢 MrSoLoDolo 的建议。

我需要使用 XGBRegression() 而不是 XGBClassifier()

Python - cross_val_score on regression problem is not working - 'ValueError: continuous is not supported'

Python - cross_val_score on regression problem is not working - 'ValueError: continuous is not supported'

cross-validation

xgboost

k-fold