Hyperopt 中的 qloguniform 搜索 space 设置问题

Question

我正在努力使用 hyperopt 来调整我的 ML 模型，但在使用 qloguniform 作为搜索时遇到了麻烦 space。我给出了来自 official wiki 的示例并更改了搜索 space。

import pickle
import time
#utf8
import pandas as pd
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

def objective(x):
    return {
        'loss': x ** 2,
        'status': STATUS_OK,
        # -- store other results like this
        'eval_time': time.time(),
        'other_stuff': {'type': None, 'value': [0, 1, 2]},
        # -- attachments are handled differently
        'attachments':
            {'time_module': pickle.dumps(time.time)}
        }
trials = Trials()
best = fmin(objective,
    space=hp.qloguniform('x', np.log(0.001), np.log(0.1), np.log(0.001)),
    algo=tpe.suggest,
    max_evals=100,
    trials=trials)
pd.DataFrame(trials.trials)

但是出现以下错误。

ValueError: ('negative arg to lognormal_cdf', array([-3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764, -3.45387764]))

我尝试过不进行对数转换，如下所示，但输出值为 log transformation（例如 1.017、1.0008、1.02456），这是错误的。与文档一致。

hp.qloguniform('x', 0.001,0.1, 0.001)

谢谢

Answer 1

问题似乎出在 hp.qloguniform、q 的最后一个参数以及 tpe.suggest 如何使用它。

首先让我们讨论一下q。根据文档：
hp.qloguniform(label, low, high, q)
```
round(exp(uniform(low, high)) / q) * q 
```
Suitable for a discrete variable with respect to which the objective is "smooth" and gets smoother with the size of the value, but which should be bounded both above and below.
q 这里是一个 "quantizer"，它将定义的 space 的输出限制为 q 的倍数。例如，以下是 qloguniform:
内部发生的情况
```
from hyperopt import pyll, hp
n_samples = 10

space = hp.loguniform('x', np.log(0.001), np.log(0.1))
evaluated = [pyll.stochastic.sample(space) for _ in range(n_samples)]
# Output: [0.04645754, 0.0083128 , 0.04931957, 0.09468335, 0.00660693,
#          0.00282584, 0.01877195, 0.02958924, 0.00568617, 0.00102252]

q = 0.005
qevaluated = np.round(np.array(evaluated)/q) * q
# Output: [0.045, 0.01 , 0.05 , 0.095, 0.005, 0.005, 0.02 , 0.03 , 0.005, 0.])
```
在这里比较evaluated和qevaluated。 qevaluated 是 q 的倍数，或者我们说它在 q 的 "intervals"（或步骤）中量化。您可以尝试更改 q 值以了解更多信息。

您在问题中定义的 q 与生成的样本范围 (0.001 to 0.1) 相比非常大：
```
np.log(0.001)
# Output: -6.907755278982137
```
因此，此处所有值的输出都将为 0。
```
q = np.log(0.001)
qevaluated = np.round(np.array(evaluated)/q) * q
# Output: [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
```
现在进入 tpe.suggest（this paper): TPE uses a tree of different estimators to optimize the search process, during which it divides the search space depending on the generator of space (in this case qloguniform). See code here 的第 4 节了解详情。为了将 space 分成多个部分，它将使用 q.

但是由于你的 space 中的所有点都是 0.0（如上所述），这个负值 q 为 lognormal_cdf which is not acceptable 生成无效边界，因此错误。

长话短说，您对 q 的用法不正确。正如您在评论中所说：-

Also q value should not be used inside the log uniform/log normal random sampling according to round(exp(uniform(low, high)) / q) * q

所以您应该只提供 q 的值，这些值对您所需的 space 有效。所以在这里，既然要生成0.001和0.1之间的值，那么q的值就应该和它们有可比性。

我同意您在 qloguniform 中提供 np.log(0.001) 和 np.log(0.1)，但这样输出值就在 0.001 和 0.1 之间。所以不要在 q 中使用 np.log。 q 应根据生成的值使用。

Hyperopt 中的 qloguniform 搜索 space 设置问题

qloguniform search space setting issue in Hyperopt

python

machine-learning

hyperparameters

hyperopt