使用 Google Cloud ML Engine 和 XGBoost 调整超参数

Hyperparameters tuning with Google Cloud ML Engine and XGBoost

我正在尝试复制在此 link 中报告的超参数调整示例，但我想在我的训练应用程序中使用 scikit learn XGBoost 而不是 tensorflow。

我能够运行在单个作业中针对每个超参数组合进行多次试验。但是，ML-Engine returned 的 Training 输出对象不包含 finalMetric 字段，报告指标信息（见下图的差异）。

我从上面的 link 示例中得到了什么： Training output object with Tensorflow training app

我得到的结果运行使用 XGBoost 训练我的训练应用程序： Training output object with XGBoost training app

XGBoost 是否有办法 return 训练指标到 ML-Engine？

似乎这个过程对于 tensorflow 是自动的，如文档中所述：

How Cloud ML Engine gets your metric

You may notice that there are no instructions in this documentation for passing your hyperparameter metric to the Cloud ML Engine training service. That's because the service monitors TensorFlow summary events generated by your training application and retrieves the metric.

XGBoost有没有类似的机制？

现在，我总是可以在每次试验结束时将每个指标结果转储到一个文件中，然后手动分析它们以获得 select 最佳参数。但是，这样做，我是否会失去 Cloud ML Engine 提供的自动化机制，尤其是关于 "ALGORITHM_UNSPECIFIED" 超参数搜索算法？

即

ALGORITHM_UNSPECIFIED: [...] applies Bayesian optimization to search the space of possible hyperparameter values, resulting in the most effective technique for your set of hyperparameters.

XGBoost 的超参数调整支持以不同的方式实现。我们创建了 cloudml-hypertune python package to help do it. We're still working on the public doc for it. At the meantime, you can refer to this staging sample 以了解如何使用它。

Sara Robinson 在 google 整理了一个很好的 post 如何做到这一点。我不会反驳并声称它是我自己的，我会 post 在这里为遇到此问题的任何其他人提供 post:

https://sararobinson.dev/2019/09/12/hyperparameter-tuning-xgboost.html

使用 Google Cloud ML Engine 和 XGBoost 调整超参数

Hyperparameters tuning with Google Cloud ML Engine and XGBoost

scikit-learn

google-cloud-platform

xgboost

google-cloud-ml