xgboost 预测期间出现异常：无法从 DMatrix 初始化 DMatrix

Question

我使用 Scikit-Learn Python API 在 Python 中训练了一个 xgboost 模型，并使用 pickle 库对其进行了序列化。我将模型上传到 ML Engine，但是当我尝试进行在线预测时，出现以下异常：

Prediction failed: Exception during xgboost prediction: can not initialize DMatrix from DMatrix

我用于预测的 json 示例如下：

{  
   "instances":[  
      [  
         24.90625,
         21.6435643564356,
         20.3762376237624,
         24.3679245283019,
         30.2075471698113,
         28.0947368421053,
         16.7797359774725,
         14.9262079299572,
         17.9888028979966,
         15.3333284503293,
         19.6535308744024,
         17.1501961307627,
         0.0,
         0.0,
         0.0,
         0.0,
         0.0,
         509.0,
         497.0,
         439.0,
         427.0,
         407.0,
         1.0,
         1.0,
         1.0,
         1.0,
         1.0,
         2.0,
         23.0,
         10.0,
         58.0,
         11.0,
         20.0,
         23.3617021276596,
         23.3617021276596,
         23.3617021276596,
         23.3617021276596,
         23.3617021276596,
         23.9423076923077,
         26.3082269243683,
         23.6212606363851,
         22.6752334301282,
         27.4343583104833,
         34.0090408101173,
         11.1991944104063,
         7.33420726455092,
         8.15160392948917,
         11.4119236389594,
         17.9429092915607,
         18.0573102225845,
         32.8902876598084,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.00286123032904149,
         -0.0028328611898017,
         0.0534138904223018,
         0.0534138904223018,
         0.0534138904223018,
         0.0534138904223018,
         0.0534138904223018,
         0.0531491870801522
      ]
   ]
}

我使用以下代码来训练我的模型：

def _train_model(X, y):
    clf = xgb.XGBClassifier(max_depth=6,
                            learning_rate=0.01,
                            n_estimators=100,
                            n_jobs=-1)
    clf.fit(X, y)
    return clf

其中 X 和 y 都是 numpy.ndarray:

Type of X: <class 'numpy.ndarray'> Type of y: <class 'numpy.ndarray'>

我还使用 xgboost 0.72.1、Python 3.5 和 ML 运行时 1.9。

有人知道问题的根源是什么吗？

谢谢！

Answer 1

问题似乎是酸洗造成的。我能够重现它并进行修复，但同时你可以尝试像下面这样导出你的分类器吗？

clf._Booster.save_model('./model.bst')

现在应该可以解除对您的封锁。如果没有，请随时联系 cloudml-feedback@google.com。

Answer 2

当我尝试使用以 .pkl 格式转储的经过训练的 XGBoost 模型对测试数据进行评分时，我也遇到了类似的问题或功能不匹配。然而，在以 .bst 格式保存模型后，我能够毫无问题地对相同的训练数据进行评分。就 XGBoost 而言，.pkl 和 .bst 格式的两种实现似乎有所不同。

Answer 3

更进一步，回答 kuza 上面关于加载已保存模型的问题：

保存模型：

clf._Booster.save_model('./model.bst')

正在加载保存的模型：

model = xgboost.Booster({'nthread': 4})  # initialize before loading model
model.load_model('./model.bst')  # load model

这解决了我在模型上使用 pickle 时遇到的 2 个问题。问题 1 是一个奇怪的例外：ValueError: feature_names mismatch:

同时检查您是否在加载的模型上使用 predict_proba，并得到一个奇怪的异常。解决方法就是使用直接预测函数 vice predict_proba.

xgboost 预测期间出现异常：无法从 DMatrix 初始化 DMatrix

Exception during xgboost prediction: can not initialize DMatrix from DMatrix

python-3.x

xgboost

google-cloud-ml