带有 Synapse ML 预测的 Azure Synapste 预测模型

Azure Synapste Predict Model with Synapse ML predict

我按照微软的官方教程:https://docs.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-score-model-predict-spark-pool

但是当我执行时:

#Bind model within Spark session
model = pcontext.bind_model(
    return_types=RETURN_TYPES, 
    runtime=RUNTIME, 
    model_alias="Sales", #This alias will be used in PREDICT call to refer  this   model
    model_uri=AML_MODEL_URI, #In case of AML, it will be AML_MODEL_URI
    aml_workspace=ws #This is only for AML. In case of ADLS, this parameter can be removed
).register()

我有:

NotADirectoryError: [Errno 20] Not a directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1648328086462_0002/spark-3d802a7e-15b7-4eb6-88c5-f0e01f8cdb35/userFiles-fbe23a43-67d3-4e65-a879-4a497e804b40/68603955220f5f8646700d809b71be9949011a2476a34965a3d5c0f3d14de79b.pkl/MLmodel' Traceback (most recent call last):

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_context.py", line 47, in bind_model udf = _create_udf(

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_udf.py", line 104, in _create_udf model_runtime = runtime_gen._create_runtime()

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 103, in _create_runtime if self._check_model_runtime_compatibility(model_runtime):

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 166, in _check_model_runtime_compatibility model_wrapper = self._load()

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 78, in _load return SynapsePredictModelCache._get_or_load(

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_cache.py", line 172, in _get_or_load model = load_model(runtime, model_uri, functions)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 257, in load_model model = loader.load(model_uri, functions)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 122, in load model = self._load(model_uri)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 215, in _load return self._load_mlflow(model_uri)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 59, in _load_mlflow model = mlflow.pyfunc.load_model(model_uri)

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/mlflow/pyfunc/init.py", line 640, in load_model model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))

File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/mlflow/models/model.py", line 124, in load with open(path) as f:

NotADirectoryError: [Errno 20] Not a directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1648328086462_0002/spark-3d802a7e-15b7-4eb6-88c5-f0e01f8cdb35/userFiles-fbe23a43-67d3-4e65-a879-4a497e804b40/68603955220f5f8646700d809b71be9949011a2476a34965a3d5c0f3d14de79b.pkl/MLmodel'

我该如何修复该错误?

(UPDATE:29/3/2022): You will experiencing this error message if you model does not contains all the required files in the ML model.

根据重现,我创建了两个名为:

的 ML 模型

sklearn_regression_model: Which contains only sklearn_regression_model.pkl file.

When I predict for MLFLOW packaged model named sklearn_regression_model, getting same error as shown above:

linear_regression: Which contains the below files:

When I predict for MLFLOW packaged model named linear_regression, it works as excepted.


It should be AML_MODEL_URI = "" #In URI ":x" => Rossman_Sales:2

Before running this script, update it with the URI for ADLS Gen2 data file along with model output return data type and ADLS/AML URI for the model file.

#Set model URI
       #Set AML URI, if trained model is registered in AML
          AML_MODEL_URI = "<aml model uri>" #In URI ":x" signifies model version in AML. You can   choose which model version you want to run. If ":x" is not provided then by default   latest version will be picked.

       #Set ADLS URI, if trained model is uploaded in ADLS
          ADLS_MODEL_URI = "abfss://<filesystemname>@<account name>.dfs.core.windows.net/<model   mlflow folder path>"

来自 AML 工作区的模型 URI:

DATA_FILE = "abfss://data@cheprasynapse.dfs.core.windows.net/AML/LengthOfStay_cooked_small.csv"
AML_MODEL_URI_SKLEARN = "aml://mlflow_sklearn:1" #Here ":1" signifies model version in AML. We can choose which version we want to run. If ":1" is not provided then by default latest version will be picked
RETURN_TYPES = "INT"
RUNTIME = "mlflow"

上传到 ADLS Gen2 的模型 URI:

DATA_FILE = "abfss://data@cheprasynapse.dfs.core.windows.net/AML/LengthOfStay_cooked_small.csv"
AML_MODEL_URI_SKLEARN = "abfss://data@cheprasynapse.dfs.core.windows.net/linear_regression/linear_regression" #Here ":1" signifies model version in AML. We can choose which version we want to run. If ":1" is not provided then by default latest version will be picked
RETURN_TYPES = "INT"
RUNTIME = "mlflow"