用于推理的 AzureML 环境：无法将 pip 包添加到依赖项

Question

我找不到将依赖项添加到 ML 推理的 Azure 容器实例的正确方法。

我基本上是按照本教程开始的：Train and deploy an image classification model with an example Jupyter Notebook

它工作正常。

现在我想部署经过训练的 TensorFlow 模型进行推理。我尝试了很多方法，但我始终无法将 python 依赖项添加到环境中。

来自 TensorFlow 精选环境

使用AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference：

from azureml.core import Workspace


# connect to your workspace
ws = Workspace.from_config()

# names
experiment_name = "my-experiment"
model_name = "my-model"
env_version="1"
env_name="my-env-"+env_version
service_name = str.lower(model_name + "-service-" + env_version)


# create environment for the deploy
from azureml.core.environment import Environment, DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice

# get a curated environment
env = Environment.get(
    workspace=ws, 
    name="AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference",
# )
custom_env = env.clone(env_name)
custom_env.inferencing_stack_version='latest'

# add packages
conda_dep = CondaDependencies()
python_packages = ['joblib', 'numpy', 'os', 'json', 'tensorflow']
for package in python_packages:
    conda_dep.add_pip_package(package)
    conda_dep.add_conda_package(package)

# Adds dependencies to PythonSection of env
custom_env.python.user_managed_dependencies=True
custom_env.python.conda_dependencies=conda_dep

custom_env.register(workspace=ws)

# create deployment config i.e. compute resources
aciconfig = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
    tags={"experiment": experiment_name, "model": model_name},
)

from azureml.core.model import InferenceConfig
from azureml.core.model import Model

# get the registered model
model = Model(ws, model_name)

# create an inference config i.e. the scoring script and environment
inference_config = InferenceConfig(entry_script="score.py", environment=custom_env)

# deploy the service
service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=aciconfig,
)

service.wait_for_deployment(show_output=True)

我得到以下日志：


AzureML image information: tensorflow-2.4-ubuntu18.04-py37-cpu-inference:20220110.v1


PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable: 

Pip Dependencies
---------------
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T10:21:09,855130300+00:00 - iot-server/finish 1 0
2022-01-24T10:21:09,856870100+00:00 - Exit code 1 is normal. Not restarting iot-server.
absl-py==0.15.0
applicationinsights==0.11.10
astunparse==1.6.3
azureml-inference-server-http==0.4.2
cachetools==4.2.4
certifi==2021.10.8
charset-normalizer==2.0.10
click==8.0.3
Flask==1.0.3
flatbuffers==1.12
gast==0.3.3
google-auth==2.3.3
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.32.0
gunicorn==20.1.0
h5py==2.10.0
idna==3.3
importlib-metadata==4.10.0
inference-schema==1.3.0
itsdangerous==2.0.1
Jinja2==3.0.3
Keras-Preprocessing==1.1.2
Markdown==3.3.6
MarkupSafe==2.0.1
numpy==1.19.5
oauthlib==3.1.1
opt-einsum==3.3.0
pandas==1.1.5
protobuf==3.19.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.2
pytz==2021.3
requests==2.27.1
requests-oauthlib==1.3.0
rsa==4.8
six==1.15.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
termcolor==1.1.0
typing-extensions==3.7.4.3
urllib3==1.26.8
Werkzeug==2.0.2
wrapt==1.12.1
zipp==3.7.0


Entry script directory: /var/azureml-app/.

Dynamic Python package installation is disabled.
Starting AzureML Inference Server HTTP.

Azure ML Inferencing HTTP server v0.4.2


Server Settings
---------------
Entry Script Name: score.py
Model Directory: /var/azureml-app/azureml-models/my-model/1
Worker Count: 1
Worker Timeout (seconds): 300
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None


Server Routes
---------------
Liveness Probe: GET   127.0.0.1:31311/
Score:          POST  127.0.0.1:31311/score

Starting gunicorn 20.1.0
Listening at: http://0.0.0.0:31311 (69)
Using worker: sync
Booting worker with pid: 100
Exception in worker process
Traceback (most recent call last):
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/opt/miniconda/envs/amlenv/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/entry.py", line 1, in <module>
    import create_app
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/create_app.py", line 4, in <module>
    from routes_common import main
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/routes_common.py", line 32, in <module>
    from aml_blueprint import AMLBlueprint
  File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/aml_blueprint.py", line 28, in <module>
    main_module_spec.loader.exec_module(main)
  File "/var/azureml-app/score.py", line 4, in <module>
    import joblib
ModuleNotFoundError: No module named 'joblib'
Worker exiting (pid: 100)
Shutting down: Master
Reason: Worker failed to boot.
2022-01-24T10:21:13,851467800+00:00 - gunicorn/finish 3 0
2022-01-24T10:21:13,853259700+00:00 - Exit code 3 is not normal. Killing image.

来自 Conda 规范

与以前相同，但使用来自 Conda 规范的全新环境并更改 env_version 数字：

# ...


env_version="2"

# ...

custom_env = Environment.from_conda_specification(name=env_name, file_path="my-env.yml")
custom_env.docker.base_image = DEFAULT_CPU_IMAGE

# ...

与 my-env.yml :

name: my-env
dependencies:
- python

- pip:
  - azureml-defaults
  - azureml-sdk
  - sklearn
  - numpy
  - matplotlib
  - joblib
  - uuid
  - requests
  - tensorflow

我得到这个日志：

2022-01-24T11:06:54,887886931+00:00 - iot-server/run 
2022-01-24T11:06:54,891839877+00:00 - rsyslog/run 
2022-01-24T11:06:54,893640998+00:00 - gunicorn/run 
2022-01-24T11:06:54,912032812+00:00 - nginx/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T11:06:55,398420960+00:00 - iot-server/finish 1 0
2022-01-24T11:06:55,414425146+00:00 - Exit code 1 is normal. Not restarting iot-server.

PATH environment variable: /opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable: 

Pip Dependencies
---------------
brotlipy==0.7.0
certifi==2020.6.20
cffi @ file:///tmp/build/80754af9/cffi_1605538037615/work
chardet @ file:///tmp/build/80754af9/chardet_1605303159953/work
conda==4.9.2
conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1603018138503/work
cryptography @ file:///tmp/build/80754af9/cryptography_1605544449973/work
idna @ file:///tmp/build/80754af9/idna_1593446292537/work
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1605545627475/work
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
requests @ file:///tmp/build/80754af9/requests_1592841827918/work
ruamel-yaml==0.15.87
six @ file:///tmp/build/80754af9/six_1605205313296/work
tqdm @ file:///tmp/build/80754af9/tqdm_1605303662894/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1603305693037/work

Starting HTTP server
2022-01-24T11:06:59,701365128+00:00 - gunicorn/finish 127 0
./run: line 127: exec: gunicorn: not found
2022-01-24T11:06:59,706177784+00:00 - Exit code 127 is not normal. Killing image.

我真的不知道我错过了什么，而且我已经搜索了太久（Azure 文档，SO，...）。

感谢您的帮助！

编辑：我尝试过的解决方案的非详尽列表：

How to create AzureML environement and add required packages
how to use existing conda environment as a AzureML environment
...
https://docs.microsoft.com/en-us/azure/machine-learning/concept-environments#environment-building-caching-and-reuse
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#add-packages-to-an-environment
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#define-a-deployment-configuration
...

Answer 1

如果你想创建一个自定义环境，你可以使用下面的代码来设置环境配置。

创造环境

myenv = Environment(name="Environment")

myenv.docker.enabled = True

myenv.python.conda_dependencies = CondaDependencies.create(conda_packages = ['numpy','scikit-learn','pip','pandas'], pip_packages = ['azureml-defaults~= 1.34.0','azureml','azureml-core~= 1.34.0',"azureml-sdk",'inference-schema','azureml-telemetry~= 1.34.0','azureml- train-automl~= 1.34.0','azure-ml-api-sdk','python-dotenv','azureml-contrib-server','azureml-inference-server-http'])

参考文档：https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment(class)?view=azure-ml-py#:~:text=Upload%20the%20private%20pip%20wheel,in%20the%20workspace%20storage%20blob.&text=Build%20a%20Docker%20image%20for%20this%20environment%20in%20the%20cloud.&text=Build%20the%20local%20Docker%20or%20conda%20environment.

Answer 2

我认为在 Azure 服务器中实施 joblib 存在一个小的安全问题，请不要在您的代码中加载它，它会运行。

Answer 3

好的，我成功了：我从头开始并且成功了。

我不知道我之前的所有尝试都出了什么问题，这太糟糕了。

多个问题以及我（认为我）如何解决它们：

joblib ：实际上我不需要它来加载我的 Keras 模型。但问题不在于这个特定的库，而是我无法将依赖项添加到推理环境。
Environment ：最后，我只能使用自定义环境来实现：Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml") 。我无法将我的库（或指定特定的包版本）添加到“精选环境”。我不知道为什么...
TensorFlow ：我遇到的最后一个问题是我在 AzureML Notebook 的 azureml_py38_PT_TF 内核 (tensorflow==2.7.0) 中训练和注册了我的模型，并尝试在推理中加载它 Docker 图片 (tensorflow==2.4.0)。所以我必须指定我想在推理图像中使用的 TensorFlow 版本（这需要解决之前的问题）。

最终成功的方法：

notebook.ipynb

import uuid
from azureml.core import Workspace, Environment, Model
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig


version = "test-"+str(uuid.uuid4())[:8]

env = Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=env)

ws = Workspace.from_config()
model = Model(ws, model_name)

aci_config = AciWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1,
)

service = Model.deploy(
    workspace=ws,
    name=version,
    models=[model],
    inference_config=inference_config,
    deployment_config=aci_config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)

conda_dependencies.yml

channels:
- conda-forge
dependencies:
- python=3.8
- pip:
  - azureml-defaults
  - azureml-sdk
  - numpy
  - tensorflow==2.7.0

score.py

import os
import json
import numpy as np
import tensorflow as tf


def init():
    global model

    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/data/model")
    model = tf.keras.models.load_model(model_path)



def run(raw_data):
    data = np.array(json.loads(raw_data)["data"])
    y_hat = model.predict(data)

    return y_hat.tolist()

用于推理的 AzureML 环境：无法将 pip 包添加到依赖项

AzureML Environment for Inference : can't add pip packages to dependencies

python

tensorflow

jupyter-notebook

azureml

来自 TensorFlow 精选环境

来自 Conda 规范

创造环境