机器学习：合理时间内的最佳参数值

Machine learning: optimal parameter values in reasonable time

抱歉，如果这是重复的。

我有一个双class预测模型；它有 n 个可配置（数字）参数。如果您正确调整这些参数，该模型可以很好地工作，但很难找到这些参数的具体值。我为此使用了网格搜索（例如，为每个参数提供 m 值）。这会产生 m ^ n 次学习，并且非常非常耗时，即使在具有 24 个内核的机器上并行运行也是如此。

我尝试修复除一个参数之外的所有参数并仅更改一个参数（产生 m × n 次），但对我来说如何处理我得到的结果并不明显。这是负样本（红色）和正样本（蓝色）的精度（三角形）和召回率（点）的示例图：

简单地采用这种方式获得的每个参数的 "winner" 值并将它们组合起来并不会产生最佳（甚至良好）的预测结果。我考虑过以 precision/recall 作为因变量在参数集上构建回归，但我认为具有超过 5 个自变量的回归不会比网格搜索场景快得多。

您建议如何找到好的参数值，但要有合理的估计时间？抱歉，如果这有一些明显的（或有据可查的）答案。

我建议 Simplex Algorithm with Simulated Annealing:

使用起来非常简单。只需给它 n + 1 点，然后让它运行达到某个可配置的值（迭代次数或收敛）。
以所有可能的语言实现。
不需要导数。
比您当前使用的方法更能适应局部最优。

我会使用随机网格搜索（在您认为合理的给定范围内为每个参数选择随机值，并评估每个此类随机选择的配置），您可以运行只要你负担得起。 This paper 运行一些实验表明这至少与网格搜索一样好：

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time.

对于它的价值，我已经使用 scikit-learn's random grid search 解决了一个问题，该问题需要为文本分类任务优化大约 10 个超参数，仅在大约 1000 次迭代中就取得了非常好的结果。

机器学习：合理时间内的最佳参数值

Machine learning: optimal parameter values in reasonable time

time

estimation

machine-learning