LightGBM 内部的问题
Problems in LightGBM internals
无法理解 LightGBM(Windows 平台)发生了什么。以前我觉得这个算法很牛逼,现在他的性能太差了
用于比较(每个算法中的默认参数)LightGBM根据简单的DIFF-metric =(实际-预测)执行:
- CatBoostRegressor() - 18142884
- XGBoostRegressor() - 20235110
- GradientBoostingRegressor() - 20437130
- LGBMRegressor() - 60296698(版本=2.0.5)
我试图用 HyperOpt 找到一些更好的参数,但也没有成功
LGBM_SPACE = {
'type': 'LGBM',
'task': hp.choice('lgbm_task', ['train', 'prediction']),
'boosting_type': hp.choice('lgbm_boosting_type', ['gbdt', 'dart']),
'objective': hp.choice('lgbm_objective', ['regression']),
'n_estimators': hp.choice('lgbm_n_estimators', range(10, 201, 5)),
'learning_rate': hp.uniform('lgbm_learning_rate', 0.05, 1.0),
'num_leaves': hp.choice('lgbm_num_leaves', range(2, 7, 1)),
'tree_learner': hp.choice('lgbm_tree_learner', ['serial', 'feature', 'data']),
'metric': hp.choice('lgbm_metric', ['l1', 'l2', 'huber', 'fair']),
'huber_delta': hp.uniform('lgbm_huber_delta', 0.0, 1.0),
'fair_c': hp.uniform('lgbm_fair_c', 0.0, 1.0),
'max_depth': hp.choice('lgbm_max_depth', range(3, 11)),
'min_data_in_leaf': hp.choice('lgbm_min_data_in_leaf', range(0, 6, 1)),
'min_sum_hessian_in_leaf': hp.loguniform('lgbm_min_sum_hessian_in_leaf', -16, 5),
'feature_fraction': hp.uniform('lgbm_feature_fractionf', 0.0, 1.0),
'feature_fraction_seed': hp.choice('lgbm_feature_fraction_seed', [12345]),
'bagging_fraction': hp.uniform('lgbm_bagging_fraction', 0.0, 1.0),
'bagging_freq': hp.choice('lgbm_bagging_freq', range(0, 16, 1)),
'bagging_seed': hp.choice('lgbm_bagging_seed', [12345]),
'min_gain_to_split': hp.uniform('lgbm_min_gain_to_split', 0.0, 1.0),
'drop_rate': hp.uniform('lgbm_drop_rate', 0.0, 1.0),
'skip_drop': hp.uniform('lgbm_skip_drop', 0.0, 1.0),
'max_drop': hp.choice('lgbm_max_drop', [-1] + range(2, 51, 1)),
'drop_seed': hp.choice('lgbm_uniform_drop', [12345]),
'verbose': hp.choice('lgbm_verbose', [-1]),
'num_threads': hp.choice('lgbm_threads', [2]),
}
最好的结果就是450422301
,跟上面的比起来真是太差了
作为所有 scikit-learn 使用的示例 API:
model = LGBMRegressor()
model.fit(X, Y)
model.predict(XT)
请尝试使用master分支的最新代码。 Scikit-learn API 中出现不一致的参数,已修复:#1033.
或者您可以添加到您的 alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.
无法理解 LightGBM(Windows 平台)发生了什么。以前我觉得这个算法很牛逼,现在他的性能太差了
用于比较(每个算法中的默认参数)LightGBM根据简单的DIFF-metric =(实际-预测)执行:
- CatBoostRegressor() - 18142884
- XGBoostRegressor() - 20235110
- GradientBoostingRegressor() - 20437130
- LGBMRegressor() - 60296698(版本=2.0.5)
我试图用 HyperOpt 找到一些更好的参数,但也没有成功
LGBM_SPACE = {
'type': 'LGBM',
'task': hp.choice('lgbm_task', ['train', 'prediction']),
'boosting_type': hp.choice('lgbm_boosting_type', ['gbdt', 'dart']),
'objective': hp.choice('lgbm_objective', ['regression']),
'n_estimators': hp.choice('lgbm_n_estimators', range(10, 201, 5)),
'learning_rate': hp.uniform('lgbm_learning_rate', 0.05, 1.0),
'num_leaves': hp.choice('lgbm_num_leaves', range(2, 7, 1)),
'tree_learner': hp.choice('lgbm_tree_learner', ['serial', 'feature', 'data']),
'metric': hp.choice('lgbm_metric', ['l1', 'l2', 'huber', 'fair']),
'huber_delta': hp.uniform('lgbm_huber_delta', 0.0, 1.0),
'fair_c': hp.uniform('lgbm_fair_c', 0.0, 1.0),
'max_depth': hp.choice('lgbm_max_depth', range(3, 11)),
'min_data_in_leaf': hp.choice('lgbm_min_data_in_leaf', range(0, 6, 1)),
'min_sum_hessian_in_leaf': hp.loguniform('lgbm_min_sum_hessian_in_leaf', -16, 5),
'feature_fraction': hp.uniform('lgbm_feature_fractionf', 0.0, 1.0),
'feature_fraction_seed': hp.choice('lgbm_feature_fraction_seed', [12345]),
'bagging_fraction': hp.uniform('lgbm_bagging_fraction', 0.0, 1.0),
'bagging_freq': hp.choice('lgbm_bagging_freq', range(0, 16, 1)),
'bagging_seed': hp.choice('lgbm_bagging_seed', [12345]),
'min_gain_to_split': hp.uniform('lgbm_min_gain_to_split', 0.0, 1.0),
'drop_rate': hp.uniform('lgbm_drop_rate', 0.0, 1.0),
'skip_drop': hp.uniform('lgbm_skip_drop', 0.0, 1.0),
'max_drop': hp.choice('lgbm_max_drop', [-1] + range(2, 51, 1)),
'drop_seed': hp.choice('lgbm_uniform_drop', [12345]),
'verbose': hp.choice('lgbm_verbose', [-1]),
'num_threads': hp.choice('lgbm_threads', [2]),
}
最好的结果就是450422301
,跟上面的比起来真是太差了
作为所有 scikit-learn 使用的示例 API:
model = LGBMRegressor()
model.fit(X, Y)
model.predict(XT)
请尝试使用master分支的最新代码。 Scikit-learn API 中出现不一致的参数,已修复:#1033.
或者您可以添加到您的 alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.