用于推理的 AzureML 环境:无法将 pip 包添加到依赖项

AzureML Environment for Inference : can't add pip packages to dependencies

我找不到将依赖项添加到 ML 推理的 Azure 容器实例的正确方法。

我基本上是按照本教程开始的:Train and deploy an image classification model with an example Jupyter Notebook

它工作正常。

现在我想部署经过训练的 TensorFlow 模型进行推理。我尝试了很多方法,但我始终无法将 python 依赖项添加到环境中。

来自 TensorFlow 精选环境

使用AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference

from azureml.core import Workspace


# connect to your workspace
ws = Workspace.from_config()

# names
experiment_name = "my-experiment"
model_name = "my-model"
env_version="1"
env_name="my-env-"+env_version
service_name = str.lower(model_name + "-service-" + env_version)


# create environment for the deploy
from azureml.core.environment import Environment, DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice

# get a curated environment
env = Environment.get(
    workspace=ws, 
    name="AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference",
# )
custom_env = env.clone(env_name)
custom_env.inferencing_stack_version='latest'

# add packages
conda_dep = CondaDependencies()
python_packages = ['joblib', 'numpy', 'os', 'json', 'tensorflow']
for package in python_packages:
    conda_dep.add_pip_package(package)
    conda_dep.add_conda_package(package)

# Adds dependencies to PythonSection of env
custom_env.python.user_managed_dependencies=True
custom_env.python.conda_dependencies=conda_dep

custom_env.register(workspace=ws)

# create deployment config i.e. compute resources
aciconfig = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
    tags={"experiment": experiment_name, "model": model_name},
)

from azureml.core.model import InferenceConfig
from azureml.core.model import Model

# get the registered model
model = Model(ws, model_name)

# create an inference config i.e. the scoring script and environment
inference_config = InferenceConfig(entry_script="score.py", environment=custom_env)

# deploy the service
service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=aciconfig,
)

service.wait_for_deployment(show_output=True)

我得到以下日志:


AzureML image information: tensorflow-2.4-ubuntu18.04-py37-cpu-inference:20220110.v1


PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable: 

Pip Dependencies
---------------
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T10:21:09,855130300+00:00 - iot-server/finish 1 0
2022-01-24T10:21:09,856870100+00:00 - Exit code 1 is normal. Not restarting iot-server.
absl-py==0.15.0
applicationinsights==0.11.10
astunparse==1.6.3
azureml-inference-server-http==0.4.2
cachetools==4.2.4
certifi==2021.10.8
charset-normalizer==2.0.10
click==8.0.3
Flask==1.0.3
flatbuffers==1.12
gast==0.3.3
google-auth==2.3.3
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.32.0
gunicorn==20.1.0
h5py==2.10.0
idna==3.3
importlib-metadata==4.10.0
inference-schema==1.3.0
itsdangerous==2.0.1
Jinja2==3.0.3
Keras-Preprocessing==1.1.2
Markdown==3.3.6
MarkupSafe==2.0.1
numpy==1.19.5
oauthlib==3.1.1
opt-einsum==3.3.0
pandas==1.1.5
protobuf==3.19.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.2
pytz==2021.3
requests==2.27.1
requests-oauthlib==1.3.0
rsa==4.8
six==1.15.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
termcolor==1.1.0
typing-extensions==3.7.4.3
urllib3==1.26.8
Werkzeug==2.0.2
wrapt==1.12.1
zipp==3.7.0


Entry script directory: /var/azureml-app/.

Dynamic Python package installation is disabled.
Starting AzureML Inference Server HTTP.

Azure ML Inferencing HTTP server v0.4.2


Server Settings
---------------
Entry Script Name: score.py
Model Directory: /var/azureml-app/azureml-models/my-model/1
Worker Count: 1
Worker Timeout (seconds): 300
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None


Server Routes
---------------
Liveness Probe: GET   127.0.0.1:31311/
Score:          POST  127.0.0.1:31311/score

Starting gunicorn 20.1.0
Listening at: http://0.0.0.0:31311 (69)
Using worker: sync
Booting worker with pid: 100
Exception in worker process
Traceback (most recent call last):
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/opt/miniconda/envs/amlenv/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/entry.py", line 1, in <module>
    import create_app
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/create_app.py", line 4, in <module>
    from routes_common import main
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/routes_common.py", line 32, in <module>
    from aml_blueprint import AMLBlueprint
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/aml_blueprint.py", line 28, in <module>
    main_module_spec.loader.exec_module(main)
  File "/var/azureml-app/score.py", line 4, in <module>
    import joblib
ModuleNotFoundError: No module named 'joblib'
Worker exiting (pid: 100)
Shutting down: Master
Reason: Worker failed to boot.
2022-01-24T10:21:13,851467800+00:00 - gunicorn/finish 3 0
2022-01-24T10:21:13,853259700+00:00 - Exit code 3 is not normal. Killing image.

来自 Conda 规范

与以前相同,但使用来自 Conda 规范的全新环境并更改 env_version 数字:

# ...


env_version="2"

# ...

custom_env = Environment.from_conda_specification(name=env_name, file_path="my-env.yml")
custom_env.docker.base_image = DEFAULT_CPU_IMAGE

# ...

my-env.yml :

name: my-env
dependencies:
- python

- pip:
  - azureml-defaults
  - azureml-sdk
  - sklearn
  - numpy
  - matplotlib
  - joblib
  - uuid
  - requests
  - tensorflow

我得到这个日志:

2022-01-24T11:06:54,887886931+00:00 - iot-server/run 
2022-01-24T11:06:54,891839877+00:00 - rsyslog/run 
2022-01-24T11:06:54,893640998+00:00 - gunicorn/run 
2022-01-24T11:06:54,912032812+00:00 - nginx/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T11:06:55,398420960+00:00 - iot-server/finish 1 0
2022-01-24T11:06:55,414425146+00:00 - Exit code 1 is normal. Not restarting iot-server.

PATH environment variable: /opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable: 

Pip Dependencies
---------------
brotlipy==0.7.0
certifi==2020.6.20
cffi @ file:///tmp/build/80754af9/cffi_1605538037615/work
chardet @ file:///tmp/build/80754af9/chardet_1605303159953/work
conda==4.9.2
conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1603018138503/work
cryptography @ file:///tmp/build/80754af9/cryptography_1605544449973/work
idna @ file:///tmp/build/80754af9/idna_1593446292537/work
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1605545627475/work
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
requests @ file:///tmp/build/80754af9/requests_1592841827918/work
ruamel-yaml==0.15.87
six @ file:///tmp/build/80754af9/six_1605205313296/work
tqdm @ file:///tmp/build/80754af9/tqdm_1605303662894/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1603305693037/work

Starting HTTP server
2022-01-24T11:06:59,701365128+00:00 - gunicorn/finish 127 0
./run: line 127: exec: gunicorn: not found
2022-01-24T11:06:59,706177784+00:00 - Exit code 127 is not normal. Killing image.
    

我真的不知道我错过了什么,而且我已经搜索了太久(Azure 文档,SO,...)。

感谢您的帮助!

编辑:我尝试过的解决方案的非详尽列表:

如果你想创建一个自定义环境,你可以使用下面的代码来设置环境配置。

创造环境

myenv = Environment(name="Environment")

myenv.docker.enabled = True

myenv.python.conda_dependencies = CondaDependencies.create(conda_packages = ['numpy','scikit-learn','pip','pandas'], pip_packages = ['azureml-defaults~= 1.34.0','azureml','azureml-core~= 1.34.0',"azureml-sdk",'inference-schema','azureml-telemetry~= 1.34.0','azureml- train-automl~= 1.34.0','azure-ml-api-sdk','python-dotenv','azureml-contrib-server','azureml-inference-server-http'])

参考文档:https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment(class)?view=azure-ml-py#:~:text=Upload%20the%20private%20pip%20wheel,in%20the%20workspace%20storage%20blob.&text=Build%20a%20Docker%20image%20for%20this%20environment%20in%20the%20cloud.&text=Build%20the%20local%20Docker%20or%20conda%20environment.

我认为在 Azure 服务器中实施 joblib 存在一个小的安全问题,请不要在您的代码中加载它,它会 运行。

好的,我成功了:我从头开始并且成功了。

我不知道我之前的所有尝试都出了什么问题,这太糟糕了。

多个问题以及我(认为我)如何解决它们:

  • joblib :实际上我不需要它来加载我的 Keras 模型。但问题不在于这个特定的库,而是我无法将依赖项添加到推理环境。
  • Environment :最后,我只能使用自定义环境来实现:Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml") 。我无法将我的库(或指定特定的包版本)添加到“精选环境”。我不知道为什么...
  • TensorFlow :我遇到的最后一个问题是我在 AzureML Notebook 的 azureml_py38_PT_TF 内核 (tensorflow==2.7.0) 中训练和注册了我的模型,并尝试在推理中加载它 Docker 图片 (tensorflow==2.4.0)。所以我必须指定我想在推理图像中使用的 TensorFlow 版本(这需要解决之前的问题)。

最终成功的方法:

  • notebook.ipynb
import uuid
from azureml.core import Workspace, Environment, Model
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig


version = "test-"+str(uuid.uuid4())[:8]

env = Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=env)

ws = Workspace.from_config()
model = Model(ws, model_name)

aci_config = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
)

service = Model.deploy(
    workspace=ws,
    name=version,
    models=[model],
    inference_config=inference_config,
    deployment_config=aci_config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)
  • conda_dependencies.yml
channels:
- conda-forge
dependencies:
- python=3.8
- pip:
  - azureml-defaults
  - azureml-sdk
  - numpy
  - tensorflow==2.7.0

  • score.py
import os
import json
import numpy as np
import tensorflow as tf


def init():
    global model

    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/data/model")
    model = tf.keras.models.load_model(model_path)



def run(raw_data):
    data = np.array(json.loads(raw_data)["data"])
    y_hat = model.predict(data)

    return y_hat.tolist()