将 Ray-Tune 与 sklearn 的 RandomForestClassifier 结合使用
Using Ray-Tune with sklearn's RandomForestClassifier
将不同的基础和文档示例放在一起,我设法想出了这个:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
def objective(config, reporter):
for i in range(config['iterations']):
model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf']))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Feed the score back to tune?
reporter(precision=precision_score(y_test, y_pred, average='macro'))
space = {'n_estimators': (100,200),
'min_samples_split': (2, 10),
'min_samples_leaf': (1, 5)}
algo = BayesOptSearch(
space,
metric="precision",
mode="max",
utility_kwargs={
"kind": "ucb",
"kappa": 2.5,
"xi": 0.0
},
verbose=3
)
scheduler = AsyncHyperBandScheduler(metric="precision", mode="max")
config = {
"num_samples": 1000,
"config": {
"iterations": 10,
}
}
results = run(objective,
name="my_exp",
search_alg=algo,
scheduler=scheduler,
stop={"training_iteration": 400, "precision": 0.80},
resources_per_trial={"cpu":2, "gpu":0.5},
**config)
print(results.dataframe())
print("Best config: ", results.get_best_config(metric="precision"))
它运行,我能够在一切结束时获得最佳配置。不过,我的疑惑主要在于objective
这个函数。我写得正确吗?没有我能找到的样本
跟进问题:
- 配置对象中的
num_samples
是什么?它是每次试验从整体训练数据中提取的样本数量吗?
Tune 现在具有原生 sklearn 绑定:https://github.com/ray-project/tune-sklearn
你能试一试吗?
为了回答您原来的问题,objective 函数看起来不错; num_samples
是您要尝试的超参数配置总数。
此外,您需要从训练函数中删除 forloop:
def objective(config, reporter):
model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf']))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Feed the score back to tune
reporter(precision=precision_score(y_test, y_pred, average='macro'))
将不同的基础和文档示例放在一起,我设法想出了这个:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
def objective(config, reporter):
for i in range(config['iterations']):
model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf']))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Feed the score back to tune?
reporter(precision=precision_score(y_test, y_pred, average='macro'))
space = {'n_estimators': (100,200),
'min_samples_split': (2, 10),
'min_samples_leaf': (1, 5)}
algo = BayesOptSearch(
space,
metric="precision",
mode="max",
utility_kwargs={
"kind": "ucb",
"kappa": 2.5,
"xi": 0.0
},
verbose=3
)
scheduler = AsyncHyperBandScheduler(metric="precision", mode="max")
config = {
"num_samples": 1000,
"config": {
"iterations": 10,
}
}
results = run(objective,
name="my_exp",
search_alg=algo,
scheduler=scheduler,
stop={"training_iteration": 400, "precision": 0.80},
resources_per_trial={"cpu":2, "gpu":0.5},
**config)
print(results.dataframe())
print("Best config: ", results.get_best_config(metric="precision"))
它运行,我能够在一切结束时获得最佳配置。不过,我的疑惑主要在于objective
这个函数。我写得正确吗?没有我能找到的样本
跟进问题:
- 配置对象中的
num_samples
是什么?它是每次试验从整体训练数据中提取的样本数量吗?
Tune 现在具有原生 sklearn 绑定:https://github.com/ray-project/tune-sklearn
你能试一试吗?
为了回答您原来的问题,objective 函数看起来不错; num_samples
是您要尝试的超参数配置总数。
此外,您需要从训练函数中删除 forloop:
def objective(config, reporter):
model = RandomForestClassifier(random_state=0, n_jobs=-1, max_depth=None, n_estimators= int(config['n_estimators']), min_samples_split=int(config['min_samples_split']), min_samples_leaf=int(config['min_samples_leaf']))
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Feed the score back to tune
reporter(precision=precision_score(y_test, y_pred, average='macro'))