MLflow 网络服务器 returns 400 状态，"Incompatible input types for column X. Can not safely convert float64 to <U0."

Question

我正在使用 MLflow 和 sklearn.pipeline.Pipeline() 实施异常检测 Web 服务。该模型的目的是使用服务器日志检测网络爬虫，response_length 列是我的功能之一。服务模型后，为了测试 Web 服务，我在下面发送了包含火车数据前 20 列的请求。

$ curl  --location --request POST '127.0.0.1:8000/invocations'
        --header 'Content-Type: text/csv' \
        --data-binary 'datasets/test.csv'

但是 Web 服务器的响应有状态代码 400（错误请求）和这个 JSON 正文：

{
    "error_code": "BAD_REQUEST",
    "message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}

这里是模型编译MLflow Tracking组件日志：

[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total=   4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total=   0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370

---------- logged params ----------
{'memory': 'None',
 'pca': 'PCAEstimator()',
 'rule_based': 'RuleBasedEstimator()',
 'steps': "[('transform', <log_transformer.LogTransformer object at "
          "0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
          'RuleBasedEstimator())]',
 'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
 'verbose': 'True'}

---------- logged metrics ----------
{}

---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}

---------- logged artifacts ----------
['model/MLmodel',
 'model/conda.yaml',
 'model/model.pkl',
 'model/requirements.txt']

如果有人能确切地告诉我如何解决这个模型服务问题，那将非常有帮助。

Answer 1

由mlflow.utils.autologging_utils WARNING引起的问题。

创建模型时，数据输入签名会保存在 MLmodel 文件中。您应该通过替换

将 response_length 签名输入类型从 string 更改为 double

{"name": "response_length", "type": "double"}

而不是

{"name": "response_length", "type": "string"}

所以不需要转换。使用编辑后的 MLmodel 文件为模型提供服务后，Web 服务器按预期工作。

MLflow 网络服务器 returns 400 状态，"Incompatible input types for column X. Can not safely convert float64 to <U0."

MLflow webserver returns 400 status, "Incompatible input types for column X. Can not safely convert float64 to <U0."

scikit-learn

webserver

mlflow