Hyperopt 库中的重复试验

Question

我正在使用 hyperopt 库来调整我的模型。

这是我的搜索 space:

search_space = {
            'max_df': hp.choice('max_df', [1, 0.95, 0.9]),
            'cls': hp.choice('cls', ['A', 'B', 'C', 'D', 'E', 'F', 'G',
                                     ]),
            'ngram_range': hp.choice('ngram_range', [
                (2,3), (2,4), (2,5), (2,6),
                (3,4), (3,5), (3,6),
                (4,5), (4,6), (5,6)
            ]),
        }

这是我的代码：

trials = Trials()
best = fmin(self.objective_function, space=search_space, algo=tpe.suggest, max_evals=140, trials=trials)
bp = trials.best_trial['result']['Params']
print(bp)

根据我拥有的可能参数的数量，库应该完成 210 迭代以完成搜索过程 (3 * 7 * 10)

我将参数 max_evals 设置为 140，小于可能的总数。

在每次迭代后，我都会将我的参数与分数一起保存。我发现，即使我在较低的 space（140 而不是 210）中搜索，也有重复参数的试验（迭代）。

hyperopt 库是遵循网格搜索技术还是在每次试验中采用随机参数组合？

我问的是参数选择过程，而不是优化技术（例如Bayesian优化）。

Answer 1

在您的代码中，您正在使用 tpe（树结构 Parzen 估计器），您可以在 this paper by the author of hyperopt. I can't tell you too much here about this algorithm, but just note that each such search will start with a pre-defined "startup" period. Hyperopt by default uses 20 random trials to "seed" TPE, see here 中了解更多相关信息。由于您的搜索 space 相当小，并且这些随机试验是独立挑选的，因此这可能已经说明了您的重复项。

如果您愿意，除了 TPE，您还可以使用纯随机搜索或 hyperopt 中称为 ATPE 的变体。

Hyperopt 库中的重复试验

Duplicated trials in Hyperopt library

python

parameters

hyperopt