Hyperopt:重新运行时更改最佳参数
Hyperopt: Optimal parameter changing with rerun
我正在尝试使用贝叶斯优化 (Hyperopt) 来获得 SVM 算法的最佳参数。但是,我发现最佳参数随着每个 运行.
而变化
下面提供的是一个简单的可重现案例。你能对此有所了解吗?
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.svm import SVC
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
def hyperopt_train_test(params):
clf = svm.SVC(**params)
return cross_val_score(clf, X, y).mean()
space4svm = {
'C': hp.loguniform('C', -3, 3),
'gamma': hp.loguniform('gamma', -3, 3),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space4svm, algo=tpe.suggest, max_evals=1000, trials=trials)
print ('best:')
print (best)
以下是一些最优值。
最佳:{'C':0.08776548401545513,'gamma':1.447360198193232}
最佳:{'C':0.23621788050791617,'gamma':1.2467882092108042}
最佳:{'C':0.3134163250819116,'gamma':1.0984778155489887}
那是因为在执行 fmin
期间,hyperopt
从定义的搜索 space 中抽取了 'C'
和 'gamma'
的不同值 space4cvm
在程序的每个 运行 期间随机。
要解决此问题并产生确定性结果,您需要使用 'rstate'
param of fmin
:
rstate :
numpy.RandomState, default numpy.random or `$HYPEROPT_FMIN_SEED`
Each call to `algo` requires a seed value, which should be different
on each call. This object is used to draw these seeds via `randint`.
The default rstate is numpy.random.RandomState(int(env['HYPEROPT_FMIN_SEED']))
if the 'HYPEROPT_FMIN_SEED' environment variable is set to a non-empty
string, otherwise np.random is used in whatever state it is in.
因此,如果未明确设置,默认情况下它将检查环境变量 'HYPEROPT_FMIN_SEED'
是否已设置。如果没有,那么每次都会使用一个随机数。
您可以通过以下方式使用它:
rstate = np.random.RandomState(42) #<== Use any number here but fixed
best = fmin(f, space4svm, algo=tpe.suggest, max_evals=100, trials=trials, rstate=rstate)
我正在尝试使用贝叶斯优化 (Hyperopt) 来获得 SVM 算法的最佳参数。但是,我发现最佳参数随着每个 运行.
而变化下面提供的是一个简单的可重现案例。你能对此有所了解吗?
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.svm import SVC
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
def hyperopt_train_test(params):
clf = svm.SVC(**params)
return cross_val_score(clf, X, y).mean()
space4svm = {
'C': hp.loguniform('C', -3, 3),
'gamma': hp.loguniform('gamma', -3, 3),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space4svm, algo=tpe.suggest, max_evals=1000, trials=trials)
print ('best:')
print (best)
以下是一些最优值。
最佳:{'C':0.08776548401545513,'gamma':1.447360198193232}
最佳:{'C':0.23621788050791617,'gamma':1.2467882092108042}
最佳:{'C':0.3134163250819116,'gamma':1.0984778155489887}
那是因为在执行 fmin
期间,hyperopt
从定义的搜索 space 中抽取了 'C'
和 'gamma'
的不同值 space4cvm
在程序的每个 运行 期间随机。
要解决此问题并产生确定性结果,您需要使用 'rstate'
param of fmin
:
rstate :
numpy.RandomState, default numpy.random or `$HYPEROPT_FMIN_SEED` Each call to `algo` requires a seed value, which should be different on each call. This object is used to draw these seeds via `randint`. The default rstate is numpy.random.RandomState(int(env['HYPEROPT_FMIN_SEED'])) if the 'HYPEROPT_FMIN_SEED' environment variable is set to a non-empty string, otherwise np.random is used in whatever state it is in.
因此,如果未明确设置,默认情况下它将检查环境变量 'HYPEROPT_FMIN_SEED'
是否已设置。如果没有,那么每次都会使用一个随机数。
您可以通过以下方式使用它:
rstate = np.random.RandomState(42) #<== Use any number here but fixed
best = fmin(f, space4svm, algo=tpe.suggest, max_evals=100, trials=trials, rstate=rstate)