Rapids CUML 随机森林回归模型推理

Question

我在 Google Colab 上使用 CUML 0.10.0 库中的随机森林回归模型，但在获取模型预测时遇到了问题。模型训练成功结束后，我使用 (.predict) 方法对一个非常大的数组 (41697600, 11) 进行推理。但是，我收到以下错误：

TypeError: GPU predict model only accepts float32 dtype as input, convert the data to float32 or use the CPU predict with `predict_model='CPU'`.

即使将输入 numpy 数组的 dtype 转换为 float32 并在预测方法中指定 predict_model='CPU' 参数，错误仍然存在。

这是供您参考的使用代码：

array=(X_test.values).astype('float32')
predictions = cuml_model.predict(array, predict_model='CPU',output_class=False, algo='BATCH_TREE_REORG')

模型摘要：

<bound method RandomForestRegressor.print_summary of RandomForestRegressor(n_estimators=10, max_depth=16, handle=<cuml.common.handle.Handle object at 0x7fbfa342e888>, max_features='auto', n_bins=8, n_streams=8, split_algo=1, split_criterion=2, bootstrap=True, bootstrap_features=False, verbose=False, min_rows_per_node=2, rows_sample=1.0, max_leaves=-1, accuracy_metric='mse', quantile_per_tree=False, seed=-1)>

Answer 1

这个错误信息非常混乱。我认为它失败了，因为 training 在 float64 中而不是预测中。因此，如果您改用 float32 进行训练，这一切都应该有效。优化后的 GPU 预测实现目前仅支持 float32 models。您应该能够回退到缓慢的 CPU 预测，但此异常阻止了它。

我已将此作为错误提交，我们将尝试在即将发布的版本中进行修复。随意跟随那里或添加任何额外的问题等：https://github.com/rapidsai/cuml/issues/1406

Answer 2

我在 int64 上遇到了同样的错误，但显示的是 float64 上的错误。因此，遇到相同问题的任何人都可以简单地将 int64 转换为 float32 或 int32.

Rapids CUML 随机森林回归模型推理

Rapids CUML Random Forest Regression Model Inference

python

machine-learning

random-forest

machine-learning-model

rapids