为什么 optuna 在计算完所有超参数后卡在试验 2(trial_id=3)?
Why optuna stuck at trial 2(trial_id=3) after it has calculated all hyperparameters?
我正在使用 optuna 调整 xgboost 模型的超参数。我发现它在试验 2 (trial_id=3) 停留了很长时间(244 分钟)。但是当我查看记录试验数据的 SQLite 数据库时,我发现 所有试验 2 (trial_id=3) 超参数都已计算 除了均方误差值试验 2。而 optuna 试验 2 (trial_id=3) 似乎卡在了那一步。我想知道为什么会这样?以及如何解决这个问题?
这里是代码
def xgb_hyperparameter_tuning():
def objective(trial):
params = {
"n_estimators": trial.suggest_int("n_estimators", 1000, 10000, step=100),
"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
"max_depth": trial.suggest_int("max_depth", 1, 20, step=1),
"learning_rate": trial.suggest_float("learning_rate", 0.0001, 0.2, step=0.001),
"min_child_weight": trial.suggest_float("min_child_weight", 1.0, 20.0, step=1.0),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.1, 1.0, step=0.1),
"subsample": trial.suggest_float("subsample",0.1, 1.0, step=0.1),
"reg_alpha": trial.suggest_float("reg_alpha", 0.0, 11.0, step=0.1),
"reg_lambda": trial.suggest_float("reg_lambda", 0.0, 11.0, step=0.1),
"num_parallel_tree": 10,
"random_state": 16,
"n_jobs": 10,
"early_stopping_rounds": 1000,
}
model = XGBRegressor(**params)
mse = make_scorer(mean_squared_error)
cv = cross_val_score(estimator=model, X=X_train, y=log_y_train, cv=20, scoring=mse, n_jobs=-1)
return cv.mean()
study = optuna.create_study(study_name="HousePriceCompetitionXGB", direction="minimize", storage="sqlite:///house_price_competition_xgb.db", load_if_exists=True)
study.optimize(objective, n_trials=100,)
return None
xgb_hyperparameter_tuning()
这是输出
[I 2021-11-16 10:06:27,522] A new study created in RDB with name: HousePriceCompetitionXGB
[I 2021-11-16 10:08:40,050] Trial 0 finished with value: 0.03599314763859092 and parameters: {'n_estimators': 5800, 'booster': 'gblinear', 'max_depth': 4, 'learning_rate': 0.1641, 'min_child_weight': 17.0, 'colsample_bytree': 0.4, 'subsample': 0.30000000000000004, 'reg_alpha': 10.8, 'reg_lambda': 7.6000000000000005}. Best is trial 0 with value: 0.03599314763859092.
[I 2021-11-16 10:11:55,830] Trial 1 finished with value: 0.028514652199592445 and parameters: {'n_estimators': 6600, 'booster': 'gblinear', 'max_depth': 17, 'learning_rate': 0.0821, 'min_child_weight': 20.0, 'colsample_bytree': 0.7000000000000001, 'subsample': 0.2, 'reg_alpha': 1.2000000000000002, 'reg_lambda': 7.2}. Best is trial 1 with value: 0.028514652199592445.
这里是sqlite数据库trial_values
table的数据
trial_value_id
trial_id
objective
value
1
1
0
0.0359931476385909
2
2
0
0.0285146521995924
这里是sqlite数据库trial_params
table的数据并且可以看到所有的trial 2(trial_id=3)已计算超参数
param_id
trial_id
param_name
param_value
distribution_json
1
1
n_estimators
5800.0
{"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}}
2
1
booster
1.0
{"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}}
3
1
max_depth
4.0
{"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}}
4
1
learning_rate
0.1641
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}}
5
1
min_child_weight
17.0
{"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}}
6
1
colsample_bytree
0.4
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
7
1
subsample
0.3
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
8
1
reg_alpha
10.8
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
9
1
reg_lambda
7.6
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
10
2
n_estimators
6600.0
{"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}}
11
2
booster
1.0
{"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}}
12
2
max_depth
17.0
{"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}}
13
2
learning_rate
0.0821
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}}
14
2
min_child_weight
20.0
{"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}}
15
2
colsample_bytree
0.7
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
16
2
subsample
0.2
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
17
2
reg_alpha
1.2
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
18
2
reg_lambda
7.2
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
19
3
n_estimators
7700.0
{"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}}
20
3
booster
2.0
{"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}}
21
3
max_depth
4.0
{"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}}
22
3
learning_rate
0.1221
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}}
23
3
min_child_weight
3.0
{"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}}
24
3
colsample_bytree
0.5
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
25
3
subsample
0.1
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}}
26
3
reg_alpha
10.8
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
27
3
reg_lambda
1.1
{"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}}
虽然我不是 100% 确定,但我想我知道发生了什么。
这个问题的发生是因为某些参数不适合某些booster type
并且试验会returnnan
结果卡在步骤-计算MSE
得分.
要解决问题,只需删除"booster": "dart"
。
换句话说,使用"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear"]),
而不是"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
可以解决问题。
我在调整我的 LightGBMRegressor 模型时想到了这个主意。我发现很多试验都失败了,因为这些试验 returned nan
并且它们都使用相同的 "boosting_type"="rf"
。所以我删除了 rf
并且所有 100 次试验都完成了,没有任何错误。然后我查找了上面发布的 XGBRegressor
问题。我发现所有被卡住的试验都具有相同的 "booster":"dart"
。所以我删除了 dart
和 XGBRegressor
运行 正常。
我正在使用 optuna 调整 xgboost 模型的超参数。我发现它在试验 2 (trial_id=3) 停留了很长时间(244 分钟)。但是当我查看记录试验数据的 SQLite 数据库时,我发现 所有试验 2 (trial_id=3) 超参数都已计算 除了均方误差值试验 2。而 optuna 试验 2 (trial_id=3) 似乎卡在了那一步。我想知道为什么会这样?以及如何解决这个问题?
这里是代码
def xgb_hyperparameter_tuning():
def objective(trial):
params = {
"n_estimators": trial.suggest_int("n_estimators", 1000, 10000, step=100),
"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
"max_depth": trial.suggest_int("max_depth", 1, 20, step=1),
"learning_rate": trial.suggest_float("learning_rate", 0.0001, 0.2, step=0.001),
"min_child_weight": trial.suggest_float("min_child_weight", 1.0, 20.0, step=1.0),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.1, 1.0, step=0.1),
"subsample": trial.suggest_float("subsample",0.1, 1.0, step=0.1),
"reg_alpha": trial.suggest_float("reg_alpha", 0.0, 11.0, step=0.1),
"reg_lambda": trial.suggest_float("reg_lambda", 0.0, 11.0, step=0.1),
"num_parallel_tree": 10,
"random_state": 16,
"n_jobs": 10,
"early_stopping_rounds": 1000,
}
model = XGBRegressor(**params)
mse = make_scorer(mean_squared_error)
cv = cross_val_score(estimator=model, X=X_train, y=log_y_train, cv=20, scoring=mse, n_jobs=-1)
return cv.mean()
study = optuna.create_study(study_name="HousePriceCompetitionXGB", direction="minimize", storage="sqlite:///house_price_competition_xgb.db", load_if_exists=True)
study.optimize(objective, n_trials=100,)
return None
xgb_hyperparameter_tuning()
这是输出
[I 2021-11-16 10:06:27,522] A new study created in RDB with name: HousePriceCompetitionXGB
[I 2021-11-16 10:08:40,050] Trial 0 finished with value: 0.03599314763859092 and parameters: {'n_estimators': 5800, 'booster': 'gblinear', 'max_depth': 4, 'learning_rate': 0.1641, 'min_child_weight': 17.0, 'colsample_bytree': 0.4, 'subsample': 0.30000000000000004, 'reg_alpha': 10.8, 'reg_lambda': 7.6000000000000005}. Best is trial 0 with value: 0.03599314763859092.
[I 2021-11-16 10:11:55,830] Trial 1 finished with value: 0.028514652199592445 and parameters: {'n_estimators': 6600, 'booster': 'gblinear', 'max_depth': 17, 'learning_rate': 0.0821, 'min_child_weight': 20.0, 'colsample_bytree': 0.7000000000000001, 'subsample': 0.2, 'reg_alpha': 1.2000000000000002, 'reg_lambda': 7.2}. Best is trial 1 with value: 0.028514652199592445.
这里是sqlite数据库trial_values
table的数据
trial_value_id | trial_id | objective | value |
---|---|---|---|
1 | 1 | 0 | 0.0359931476385909 |
2 | 2 | 0 | 0.0285146521995924 |
这里是sqlite数据库trial_params
table的数据并且可以看到所有的trial 2(trial_id=3)已计算超参数
param_id | trial_id | param_name | param_value | distribution_json |
---|---|---|---|---|
1 | 1 | n_estimators | 5800.0 | {"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}} |
2 | 1 | booster | 1.0 | {"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}} |
3 | 1 | max_depth | 4.0 | {"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}} |
4 | 1 | learning_rate | 0.1641 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}} |
5 | 1 | min_child_weight | 17.0 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}} |
6 | 1 | colsample_bytree | 0.4 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}} |
7 | 1 | subsample | 0.3 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}} |
8 | 1 | reg_alpha | 10.8 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}} |
9 | 1 | reg_lambda | 7.6 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}} |
10 | 2 | n_estimators | 6600.0 | {"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}} |
11 | 2 | booster | 1.0 | {"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}} |
12 | 2 | max_depth | 17.0 | {"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}} |
13 | 2 | learning_rate | 0.0821 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}} |
14 | 2 | min_child_weight | 20.0 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}} |
15 | 2 | colsample_bytree | 0.7 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}} |
16 | 2 | subsample | 0.2 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}} |
17 | 2 | reg_alpha | 1.2 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}} |
18 | 2 | reg_lambda | 7.2 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}} |
19 | 3 | n_estimators | 7700.0 | {"name": "IntUniformDistribution", "attributes": {"low": 1000, "high": 10000, "step": 100}} |
20 | 3 | booster | 2.0 | {"name": "CategoricalDistribution", "attributes": {"choices": ["gbtree", "gblinear", "dart"]}} |
21 | 3 | max_depth | 4.0 | {"name": "IntUniformDistribution", "attributes": {"low": 1, "high": 20, "step": 1}} |
22 | 3 | learning_rate | 0.1221 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0001, "high": 0.1991, "q": 0.001}} |
23 | 3 | min_child_weight | 3.0 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 1.0, "high": 20.0, "q": 1.0}} |
24 | 3 | colsample_bytree | 0.5 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}} |
25 | 3 | subsample | 0.1 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.1, "high": 1.0, "q": 0.1}} |
26 | 3 | reg_alpha | 10.8 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}} |
27 | 3 | reg_lambda | 1.1 | {"name": "DiscreteUniformDistribution", "attributes": {"low": 0.0, "high": 11.0, "q": 0.1}} |
虽然我不是 100% 确定,但我想我知道发生了什么。
这个问题的发生是因为某些参数不适合某些booster type
并且试验会returnnan
结果卡在步骤-计算MSE
得分.
要解决问题,只需删除"booster": "dart"
。
换句话说,使用"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear"]),
而不是"booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
可以解决问题。
我在调整我的 LightGBMRegressor 模型时想到了这个主意。我发现很多试验都失败了,因为这些试验 returned nan
并且它们都使用相同的 "boosting_type"="rf"
。所以我删除了 rf
并且所有 100 次试验都完成了,没有任何错误。然后我查找了上面发布的 XGBRegressor
问题。我发现所有被卡住的试验都具有相同的 "booster":"dart"
。所以我删除了 dart
和 XGBRegressor
运行 正常。