交叉验证的错误数据(不起作用)
Wrong data for cross-validation (doesn't work)
我想在5折交叉验证的过程中确定回归问题的最佳正则化系数α
当我运行下面的简单代码时,抛出错误:
alphas = np.logspace(-6, 2, 200)
skf = StratifiedKFold(n_splits=5)
lasso_cv = LassoCV(alphas=alphas, random_state=17, max_iter=5000)
for k, (train, test) in enumerate(skf.split(X_train_scaled, y_train)):
lasso_cv.fit(X_train_scaled[train], y_train[train])
print("[fold {0}] alpha: {1:.5f}, score: {2:.5f}".
format(skf, lasso_cv.alpha_, lasso_cv.score(X_train_scaled[test], y_train[test]))
)
for k, (train, test) in enumerate(skf.split(X_train_scaled, y_train)):
----> 6 lasso_cv.fit(X_train_scaled[train], y_train[train])
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
我基本上重写了 here 中的代码(最底层)
我没有 nan 或 inf 值:
where_XNaNs = np.isnan(X_train_scaled)
where_yNaNs = np.isnan(y_train)
print(X_train_scaled[where_XNaNs])
print(y_train[where_yNaNs])
print()
where_Xinfs = np.isinf(X_train_scaled)
where_yinfs = np.isinf(y_train)
print(X_train_scaled[where_Xinfs])
print(y_train[where_yinfs])
[]
Series([], Name: quality, dtype: int64)
[]
Series([], Name: quality, dtype: int64)
帮忙的人不想写答案,所以就我了。
需要将 y_train[train]
更改为 y_train.iloc[train]
(y_test
相同)。
我想在5折交叉验证的过程中确定回归问题的最佳正则化系数α
当我运行下面的简单代码时,抛出错误:
alphas = np.logspace(-6, 2, 200)
skf = StratifiedKFold(n_splits=5)
lasso_cv = LassoCV(alphas=alphas, random_state=17, max_iter=5000)
for k, (train, test) in enumerate(skf.split(X_train_scaled, y_train)):
lasso_cv.fit(X_train_scaled[train], y_train[train])
print("[fold {0}] alpha: {1:.5f}, score: {2:.5f}".
format(skf, lasso_cv.alpha_, lasso_cv.score(X_train_scaled[test], y_train[test]))
)
for k, (train, test) in enumerate(skf.split(X_train_scaled, y_train)):
----> 6 lasso_cv.fit(X_train_scaled[train], y_train[train])
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
我基本上重写了 here 中的代码(最底层)
我没有 nan 或 inf 值:
where_XNaNs = np.isnan(X_train_scaled)
where_yNaNs = np.isnan(y_train)
print(X_train_scaled[where_XNaNs])
print(y_train[where_yNaNs])
print()
where_Xinfs = np.isinf(X_train_scaled)
where_yinfs = np.isinf(y_train)
print(X_train_scaled[where_Xinfs])
print(y_train[where_yinfs])
[]
Series([], Name: quality, dtype: int64)
[]
Series([], Name: quality, dtype: int64)
帮忙的人不想写答案,所以就我了。
需要将 y_train[train]
更改为 y_train.iloc[train]
(y_test
相同)。