使用 LightGBM 示例进行网格搜索
Grid search with LightGBM example
我正在尝试使用 sklearn.model_selection
中的 GridSearchCV
为 lightgbm
模型找到最佳参数。我一直无法找到实际有效的解决方案。
我已经成功设置了一个部分可用的代码:
import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold
np.random.seed(1)
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
y = pd.read_csv('y.csv')
y = y.values.ravel()
print(train.shape, test.shape, y.shape)
categoricals = ['COL_A','COL_B']
indexes_of_categories = [train.columns.get_loc(col) for col in categoricals]
gkf = KFold(n_splits=5, shuffle=True, random_state=42).split(X=train, y=y)
param_grid = {
'num_leaves': [31, 127],
'reg_alpha': [0.1, 0.5],
'min_data_in_leaf': [30, 50, 100, 300, 400],
'lambda_l1': [0, 1, 1.5],
'lambda_l2': [0, 1]
}
lgb_estimator = lgb.LGBMClassifier(boosting_type='gbdt', objective='binary', num_boost_round=2000, learning_rate=0.01, metric='auc',categorical_feature=indexes_of_categories)
gsearch = GridSearchCV(estimator=lgb_estimator, param_grid=param_grid, cv=gkf)
lgb_model = gsearch.fit(X=train, y=y)
print(lgb_model.best_params_, lgb_model.best_score_)
这似乎有效,但 UserWarning
:
categorical_feature
keyword has been found in params
and will be
ignored. Please use categorical_feature
argument of the Dataset
constructor to pass this parameter.
我正在寻找一个可行的解决方案或者关于如何确保 lightgbm 接受上述代码中的分类参数的建议
如警告所述,categorical_feature
不是 LGBMModel
参数之一。它与 lgb.Dataset
实例化相关,在 sklearn API 的情况下直接在 fit()
方法 see the doc 中完成。因此,为了通过 GridSearchCV
优化中的那些,必须在 sklearn v0.19.1 的情况下将其作为 GridSearchCV.fit()
方法的参数提供,或者作为附加的 fit_params
参数提供GridSearchCV
在旧的 sklearn 版本中实例化
如果您正在为如何通过 fit_params 而苦恼,这也发生在我身上,那么您应该这样做:
fit_params = {'categorical_feature':indexes_of_categories}
clf = GridSearchCV(model, param_grid, cv=n_folds)
clf.fit(x_train, y_train, **fit_params)
我正在尝试使用 sklearn.model_selection
中的 GridSearchCV
为 lightgbm
模型找到最佳参数。我一直无法找到实际有效的解决方案。
我已经成功设置了一个部分可用的代码:
import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold
np.random.seed(1)
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
y = pd.read_csv('y.csv')
y = y.values.ravel()
print(train.shape, test.shape, y.shape)
categoricals = ['COL_A','COL_B']
indexes_of_categories = [train.columns.get_loc(col) for col in categoricals]
gkf = KFold(n_splits=5, shuffle=True, random_state=42).split(X=train, y=y)
param_grid = {
'num_leaves': [31, 127],
'reg_alpha': [0.1, 0.5],
'min_data_in_leaf': [30, 50, 100, 300, 400],
'lambda_l1': [0, 1, 1.5],
'lambda_l2': [0, 1]
}
lgb_estimator = lgb.LGBMClassifier(boosting_type='gbdt', objective='binary', num_boost_round=2000, learning_rate=0.01, metric='auc',categorical_feature=indexes_of_categories)
gsearch = GridSearchCV(estimator=lgb_estimator, param_grid=param_grid, cv=gkf)
lgb_model = gsearch.fit(X=train, y=y)
print(lgb_model.best_params_, lgb_model.best_score_)
这似乎有效,但 UserWarning
:
categorical_feature
keyword has been found inparams
and will be ignored. Please usecategorical_feature
argument of the Dataset constructor to pass this parameter.
我正在寻找一个可行的解决方案或者关于如何确保 lightgbm 接受上述代码中的分类参数的建议
如警告所述,categorical_feature
不是 LGBMModel
参数之一。它与 lgb.Dataset
实例化相关,在 sklearn API 的情况下直接在 fit()
方法 see the doc 中完成。因此,为了通过 GridSearchCV
优化中的那些,必须在 sklearn v0.19.1 的情况下将其作为 GridSearchCV.fit()
方法的参数提供,或者作为附加的 fit_params
参数提供GridSearchCV
在旧的 sklearn 版本中实例化
如果您正在为如何通过 fit_params 而苦恼,这也发生在我身上,那么您应该这样做:
fit_params = {'categorical_feature':indexes_of_categories}
clf = GridSearchCV(model, param_grid, cv=n_folds)
clf.fit(x_train, y_train, **fit_params)