用于推理的 AzureML 环境:无法将 pip 包添加到依赖项
AzureML Environment for Inference : can't add pip packages to dependencies
我找不到将依赖项添加到 ML 推理的 Azure 容器实例的正确方法。
我基本上是按照本教程开始的:Train and deploy an image classification model with an example Jupyter Notebook
它工作正常。
现在我想部署经过训练的 TensorFlow 模型进行推理。我尝试了很多方法,但我始终无法将 python 依赖项添加到环境中。
来自 TensorFlow 精选环境
使用AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference:
from azureml.core import Workspace
# connect to your workspace
ws = Workspace.from_config()
# names
experiment_name = "my-experiment"
model_name = "my-model"
env_version="1"
env_name="my-env-"+env_version
service_name = str.lower(model_name + "-service-" + env_version)
# create environment for the deploy
from azureml.core.environment import Environment, DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice
# get a curated environment
env = Environment.get(
workspace=ws,
name="AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference",
# )
custom_env = env.clone(env_name)
custom_env.inferencing_stack_version='latest'
# add packages
conda_dep = CondaDependencies()
python_packages = ['joblib', 'numpy', 'os', 'json', 'tensorflow']
for package in python_packages:
conda_dep.add_pip_package(package)
conda_dep.add_conda_package(package)
# Adds dependencies to PythonSection of env
custom_env.python.user_managed_dependencies=True
custom_env.python.conda_dependencies=conda_dep
custom_env.register(workspace=ws)
# create deployment config i.e. compute resources
aciconfig = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
tags={"experiment": experiment_name, "model": model_name},
)
from azureml.core.model import InferenceConfig
from azureml.core.model import Model
# get the registered model
model = Model(ws, model_name)
# create an inference config i.e. the scoring script and environment
inference_config = InferenceConfig(entry_script="score.py", environment=custom_env)
# deploy the service
service = Model.deploy(
workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_config=aciconfig,
)
service.wait_for_deployment(show_output=True)
我得到以下日志:
AzureML image information: tensorflow-2.4-ubuntu18.04-py37-cpu-inference:20220110.v1
PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable:
Pip Dependencies
---------------
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T10:21:09,855130300+00:00 - iot-server/finish 1 0
2022-01-24T10:21:09,856870100+00:00 - Exit code 1 is normal. Not restarting iot-server.
absl-py==0.15.0
applicationinsights==0.11.10
astunparse==1.6.3
azureml-inference-server-http==0.4.2
cachetools==4.2.4
certifi==2021.10.8
charset-normalizer==2.0.10
click==8.0.3
Flask==1.0.3
flatbuffers==1.12
gast==0.3.3
google-auth==2.3.3
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.32.0
gunicorn==20.1.0
h5py==2.10.0
idna==3.3
importlib-metadata==4.10.0
inference-schema==1.3.0
itsdangerous==2.0.1
Jinja2==3.0.3
Keras-Preprocessing==1.1.2
Markdown==3.3.6
MarkupSafe==2.0.1
numpy==1.19.5
oauthlib==3.1.1
opt-einsum==3.3.0
pandas==1.1.5
protobuf==3.19.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.2
pytz==2021.3
requests==2.27.1
requests-oauthlib==1.3.0
rsa==4.8
six==1.15.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
termcolor==1.1.0
typing-extensions==3.7.4.3
urllib3==1.26.8
Werkzeug==2.0.2
wrapt==1.12.1
zipp==3.7.0
Entry script directory: /var/azureml-app/.
Dynamic Python package installation is disabled.
Starting AzureML Inference Server HTTP.
Azure ML Inferencing HTTP server v0.4.2
Server Settings
---------------
Entry Script Name: score.py
Model Directory: /var/azureml-app/azureml-models/my-model/1
Worker Count: 1
Worker Timeout (seconds): 300
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None
Server Routes
---------------
Liveness Probe: GET 127.0.0.1:31311/
Score: POST 127.0.0.1:31311/score
Starting gunicorn 20.1.0
Listening at: http://0.0.0.0:31311 (69)
Using worker: sync
Booting worker with pid: 100
Exception in worker process
Traceback (most recent call last):
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/opt/miniconda/envs/amlenv/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/entry.py", line 1, in <module>
import create_app
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/create_app.py", line 4, in <module>
from routes_common import main
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/routes_common.py", line 32, in <module>
from aml_blueprint import AMLBlueprint
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/aml_blueprint.py", line 28, in <module>
main_module_spec.loader.exec_module(main)
File "/var/azureml-app/score.py", line 4, in <module>
import joblib
ModuleNotFoundError: No module named 'joblib'
Worker exiting (pid: 100)
Shutting down: Master
Reason: Worker failed to boot.
2022-01-24T10:21:13,851467800+00:00 - gunicorn/finish 3 0
2022-01-24T10:21:13,853259700+00:00 - Exit code 3 is not normal. Killing image.
来自 Conda 规范
与以前相同,但使用来自 Conda 规范的全新环境并更改 env_version
数字:
# ...
env_version="2"
# ...
custom_env = Environment.from_conda_specification(name=env_name, file_path="my-env.yml")
custom_env.docker.base_image = DEFAULT_CPU_IMAGE
# ...
与 my-env.yml
:
name: my-env
dependencies:
- python
- pip:
- azureml-defaults
- azureml-sdk
- sklearn
- numpy
- matplotlib
- joblib
- uuid
- requests
- tensorflow
我得到这个日志:
2022-01-24T11:06:54,887886931+00:00 - iot-server/run
2022-01-24T11:06:54,891839877+00:00 - rsyslog/run
2022-01-24T11:06:54,893640998+00:00 - gunicorn/run
2022-01-24T11:06:54,912032812+00:00 - nginx/run
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T11:06:55,398420960+00:00 - iot-server/finish 1 0
2022-01-24T11:06:55,414425146+00:00 - Exit code 1 is normal. Not restarting iot-server.
PATH environment variable: /opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable:
Pip Dependencies
---------------
brotlipy==0.7.0
certifi==2020.6.20
cffi @ file:///tmp/build/80754af9/cffi_1605538037615/work
chardet @ file:///tmp/build/80754af9/chardet_1605303159953/work
conda==4.9.2
conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1603018138503/work
cryptography @ file:///tmp/build/80754af9/cryptography_1605544449973/work
idna @ file:///tmp/build/80754af9/idna_1593446292537/work
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1605545627475/work
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
requests @ file:///tmp/build/80754af9/requests_1592841827918/work
ruamel-yaml==0.15.87
six @ file:///tmp/build/80754af9/six_1605205313296/work
tqdm @ file:///tmp/build/80754af9/tqdm_1605303662894/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1603305693037/work
Starting HTTP server
2022-01-24T11:06:59,701365128+00:00 - gunicorn/finish 127 0
./run: line 127: exec: gunicorn: not found
2022-01-24T11:06:59,706177784+00:00 - Exit code 127 is not normal. Killing image.
我真的不知道我错过了什么,而且我已经搜索了太久(Azure 文档,SO,...)。
感谢您的帮助!
编辑:我尝试过的解决方案的非详尽列表:
- How to create AzureML environement and add required packages
- how to use existing conda environment as a AzureML environment
- ...
- https://docs.microsoft.com/en-us/azure/machine-learning/concept-environments#environment-building-caching-and-reuse
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#add-packages-to-an-environment
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#define-a-deployment-configuration
- ...
如果你想创建一个自定义环境,你可以使用下面的代码来设置环境配置。
创造环境
myenv = Environment(name="Environment")
myenv.docker.enabled = True
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages = ['numpy','scikit-learn','pip','pandas'], pip_packages = ['azureml-defaults~= 1.34.0','azureml','azureml-core~= 1.34.0',"azureml-sdk",'inference-schema','azureml-telemetry~= 1.34.0','azureml- train-automl~= 1.34.0','azure-ml-api-sdk','python-dotenv','azureml-contrib-server','azureml-inference-server-http'])
我认为在 Azure 服务器中实施 joblib 存在一个小的安全问题,请不要在您的代码中加载它,它会 运行。
好的,我成功了:我从头开始并且成功了。
我不知道我之前的所有尝试都出了什么问题,这太糟糕了。
多个问题以及我(认为我)如何解决它们:
joblib
:实际上我不需要它来加载我的 Keras 模型。但问题不在于这个特定的库,而是我无法将依赖项添加到推理环境。
Environment
:最后,我只能使用自定义环境来实现:Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
。我无法将我的库(或指定特定的包版本)添加到“精选环境”。我不知道为什么...
TensorFlow
:我遇到的最后一个问题是我在 AzureML Notebook 的 azureml_py38_PT_TF
内核 (tensorflow==2.7.0
) 中训练和注册了我的模型,并尝试在推理中加载它 Docker 图片 (tensorflow==2.4.0
)。所以我必须指定我想在推理图像中使用的 TensorFlow 版本(这需要解决之前的问题)。
最终成功的方法:
- notebook.ipynb
import uuid
from azureml.core import Workspace, Environment, Model
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig
version = "test-"+str(uuid.uuid4())[:8]
env = Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=env)
ws = Workspace.from_config()
model = Model(ws, model_name)
aci_config = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
)
service = Model.deploy(
workspace=ws,
name=version,
models=[model],
inference_config=inference_config,
deployment_config=aci_config,
overwrite=True,
)
service.wait_for_deployment(show_output=True)
- conda_dependencies.yml
channels:
- conda-forge
dependencies:
- python=3.8
- pip:
- azureml-defaults
- azureml-sdk
- numpy
- tensorflow==2.7.0
- score.py
import os
import json
import numpy as np
import tensorflow as tf
def init():
global model
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/data/model")
model = tf.keras.models.load_model(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)["data"])
y_hat = model.predict(data)
return y_hat.tolist()
我找不到将依赖项添加到 ML 推理的 Azure 容器实例的正确方法。
我基本上是按照本教程开始的:Train and deploy an image classification model with an example Jupyter Notebook
它工作正常。
现在我想部署经过训练的 TensorFlow 模型进行推理。我尝试了很多方法,但我始终无法将 python 依赖项添加到环境中。
来自 TensorFlow 精选环境
使用AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference:
from azureml.core import Workspace
# connect to your workspace
ws = Workspace.from_config()
# names
experiment_name = "my-experiment"
model_name = "my-model"
env_version="1"
env_name="my-env-"+env_version
service_name = str.lower(model_name + "-service-" + env_version)
# create environment for the deploy
from azureml.core.environment import Environment, DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice
# get a curated environment
env = Environment.get(
workspace=ws,
name="AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference",
# )
custom_env = env.clone(env_name)
custom_env.inferencing_stack_version='latest'
# add packages
conda_dep = CondaDependencies()
python_packages = ['joblib', 'numpy', 'os', 'json', 'tensorflow']
for package in python_packages:
conda_dep.add_pip_package(package)
conda_dep.add_conda_package(package)
# Adds dependencies to PythonSection of env
custom_env.python.user_managed_dependencies=True
custom_env.python.conda_dependencies=conda_dep
custom_env.register(workspace=ws)
# create deployment config i.e. compute resources
aciconfig = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
tags={"experiment": experiment_name, "model": model_name},
)
from azureml.core.model import InferenceConfig
from azureml.core.model import Model
# get the registered model
model = Model(ws, model_name)
# create an inference config i.e. the scoring script and environment
inference_config = InferenceConfig(entry_script="score.py", environment=custom_env)
# deploy the service
service = Model.deploy(
workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_config=aciconfig,
)
service.wait_for_deployment(show_output=True)
我得到以下日志:
AzureML image information: tensorflow-2.4-ubuntu18.04-py37-cpu-inference:20220110.v1
PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable:
Pip Dependencies
---------------
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T10:21:09,855130300+00:00 - iot-server/finish 1 0
2022-01-24T10:21:09,856870100+00:00 - Exit code 1 is normal. Not restarting iot-server.
absl-py==0.15.0
applicationinsights==0.11.10
astunparse==1.6.3
azureml-inference-server-http==0.4.2
cachetools==4.2.4
certifi==2021.10.8
charset-normalizer==2.0.10
click==8.0.3
Flask==1.0.3
flatbuffers==1.12
gast==0.3.3
google-auth==2.3.3
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.32.0
gunicorn==20.1.0
h5py==2.10.0
idna==3.3
importlib-metadata==4.10.0
inference-schema==1.3.0
itsdangerous==2.0.1
Jinja2==3.0.3
Keras-Preprocessing==1.1.2
Markdown==3.3.6
MarkupSafe==2.0.1
numpy==1.19.5
oauthlib==3.1.1
opt-einsum==3.3.0
pandas==1.1.5
protobuf==3.19.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.2
pytz==2021.3
requests==2.27.1
requests-oauthlib==1.3.0
rsa==4.8
six==1.15.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.4.0
tensorflow-estimator==2.4.0
termcolor==1.1.0
typing-extensions==3.7.4.3
urllib3==1.26.8
Werkzeug==2.0.2
wrapt==1.12.1
zipp==3.7.0
Entry script directory: /var/azureml-app/.
Dynamic Python package installation is disabled.
Starting AzureML Inference Server HTTP.
Azure ML Inferencing HTTP server v0.4.2
Server Settings
---------------
Entry Script Name: score.py
Model Directory: /var/azureml-app/azureml-models/my-model/1
Worker Count: 1
Worker Timeout (seconds): 300
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None
Server Routes
---------------
Liveness Probe: GET 127.0.0.1:31311/
Score: POST 127.0.0.1:31311/score
Starting gunicorn 20.1.0
Listening at: http://0.0.0.0:31311 (69)
Using worker: sync
Booting worker with pid: 100
Exception in worker process
Traceback (most recent call last):
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/opt/miniconda/envs/amlenv/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/entry.py", line 1, in <module>
import create_app
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/create_app.py", line 4, in <module>
from routes_common import main
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/routes_common.py", line 32, in <module>
from aml_blueprint import AMLBlueprint
File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/aml_blueprint.py", line 28, in <module>
main_module_spec.loader.exec_module(main)
File "/var/azureml-app/score.py", line 4, in <module>
import joblib
ModuleNotFoundError: No module named 'joblib'
Worker exiting (pid: 100)
Shutting down: Master
Reason: Worker failed to boot.
2022-01-24T10:21:13,851467800+00:00 - gunicorn/finish 3 0
2022-01-24T10:21:13,853259700+00:00 - Exit code 3 is not normal. Killing image.
来自 Conda 规范
与以前相同,但使用来自 Conda 规范的全新环境并更改 env_version
数字:
# ...
env_version="2"
# ...
custom_env = Environment.from_conda_specification(name=env_name, file_path="my-env.yml")
custom_env.docker.base_image = DEFAULT_CPU_IMAGE
# ...
与 my-env.yml
:
name: my-env
dependencies:
- python
- pip:
- azureml-defaults
- azureml-sdk
- sklearn
- numpy
- matplotlib
- joblib
- uuid
- requests
- tensorflow
我得到这个日志:
2022-01-24T11:06:54,887886931+00:00 - iot-server/run
2022-01-24T11:06:54,891839877+00:00 - rsyslog/run
2022-01-24T11:06:54,893640998+00:00 - gunicorn/run
2022-01-24T11:06:54,912032812+00:00 - nginx/run
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-24T11:06:55,398420960+00:00 - iot-server/finish 1 0
2022-01-24T11:06:55,414425146+00:00 - Exit code 1 is normal. Not restarting iot-server.
PATH environment variable: /opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH environment variable:
Pip Dependencies
---------------
brotlipy==0.7.0
certifi==2020.6.20
cffi @ file:///tmp/build/80754af9/cffi_1605538037615/work
chardet @ file:///tmp/build/80754af9/chardet_1605303159953/work
conda==4.9.2
conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1603018138503/work
cryptography @ file:///tmp/build/80754af9/cryptography_1605544449973/work
idna @ file:///tmp/build/80754af9/idna_1593446292537/work
pycosat==0.6.3
pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work
pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1605545627475/work
PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work
requests @ file:///tmp/build/80754af9/requests_1592841827918/work
ruamel-yaml==0.15.87
six @ file:///tmp/build/80754af9/six_1605205313296/work
tqdm @ file:///tmp/build/80754af9/tqdm_1605303662894/work
urllib3 @ file:///tmp/build/80754af9/urllib3_1603305693037/work
Starting HTTP server
2022-01-24T11:06:59,701365128+00:00 - gunicorn/finish 127 0
./run: line 127: exec: gunicorn: not found
2022-01-24T11:06:59,706177784+00:00 - Exit code 127 is not normal. Killing image.
我真的不知道我错过了什么,而且我已经搜索了太久(Azure 文档,SO,...)。
感谢您的帮助!
编辑:我尝试过的解决方案的非详尽列表:
- How to create AzureML environement and add required packages
- how to use existing conda environment as a AzureML environment
- ...
- https://docs.microsoft.com/en-us/azure/machine-learning/concept-environments#environment-building-caching-and-reuse
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#add-packages-to-an-environment
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#define-a-deployment-configuration
- ...
如果你想创建一个自定义环境,你可以使用下面的代码来设置环境配置。
创造环境
myenv = Environment(name="Environment")
myenv.docker.enabled = True
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages = ['numpy','scikit-learn','pip','pandas'], pip_packages = ['azureml-defaults~= 1.34.0','azureml','azureml-core~= 1.34.0',"azureml-sdk",'inference-schema','azureml-telemetry~= 1.34.0','azureml- train-automl~= 1.34.0','azure-ml-api-sdk','python-dotenv','azureml-contrib-server','azureml-inference-server-http'])
我认为在 Azure 服务器中实施 joblib 存在一个小的安全问题,请不要在您的代码中加载它,它会 运行。
好的,我成功了:我从头开始并且成功了。
我不知道我之前的所有尝试都出了什么问题,这太糟糕了。
多个问题以及我(认为我)如何解决它们:
joblib
:实际上我不需要它来加载我的 Keras 模型。但问题不在于这个特定的库,而是我无法将依赖项添加到推理环境。Environment
:最后,我只能使用自定义环境来实现:Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
。我无法将我的库(或指定特定的包版本)添加到“精选环境”。我不知道为什么...TensorFlow
:我遇到的最后一个问题是我在 AzureML Notebook 的azureml_py38_PT_TF
内核 (tensorflow==2.7.0
) 中训练和注册了我的模型,并尝试在推理中加载它 Docker 图片 (tensorflow==2.4.0
)。所以我必须指定我想在推理图像中使用的 TensorFlow 版本(这需要解决之前的问题)。
最终成功的方法:
- notebook.ipynb
import uuid
from azureml.core import Workspace, Environment, Model
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig
version = "test-"+str(uuid.uuid4())[:8]
env = Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=env)
ws = Workspace.from_config()
model = Model(ws, model_name)
aci_config = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
)
service = Model.deploy(
workspace=ws,
name=version,
models=[model],
inference_config=inference_config,
deployment_config=aci_config,
overwrite=True,
)
service.wait_for_deployment(show_output=True)
- conda_dependencies.yml
channels:
- conda-forge
dependencies:
- python=3.8
- pip:
- azureml-defaults
- azureml-sdk
- numpy
- tensorflow==2.7.0
- score.py
import os
import json
import numpy as np
import tensorflow as tf
def init():
global model
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/data/model")
model = tf.keras.models.load_model(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)["data"])
y_hat = model.predict(data)
return y_hat.tolist()