Azure ML：如何访问失败模型部署的日志

Question

我正在部署一个失败的 Keras 模型，并出现以下错误。异常表明我可以通过运行 "print(service.get_logs())" 检索日志，但这给了我空的结果。我正在从我的 AzureNotebook 部署模型，并使用相同的 "service" var 来检索日志。

此外，如何从容器实例中检索日志？我正在部署到我创建的 AKS 计算集群。遗憾的是，异常中的文档 link 也没有详细说明如何检索这些日志。

More information can be found using '.get_logs()' Error:

{   "code":
"KubernetesDeploymentFailed",   "statusCode": 400,   "message":
"Kubernetes Deployment failed",   "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.\nPlease check
the logs for your container instance: my-model-service. From
the AML SDK, you can run print(service.get_logs()) if you have service
object to fetch the logs. \nYou can also try to run image
mlwks.azurecr.io/azureml/azureml_3c0c34b65cf18c8644e8d745943ab7d2:latest
locally. Please refer to http://aka.ms/debugimage#service-launch-fails
for more information."
    }   ] }

更新

这是我部署模型的代码：

environment = Environment('my-environment')
environment.python.conda_dependencies = CondaDependencies.create(pip_packages=["azureml-defaults","azureml-dataprep[pandas,fuse]","tensorflow", "keras", "matplotlib"])
service_name = 'my-model-service'

# Remove any existing service under the same name.
try:
    Webservice(ws, service_name).delete()
except WebserviceException:
    pass

inference_config = InferenceConfig(entry_script='score.py', environment=environment)
comp = ComputeTarget(workspace=ws, name="ml-inference-dev")
service = Model.deploy(workspace=ws,
                       name=service_name,
                       models=[model],
                       inference_config=inference_config,
                       deployment_target=comp 
                      )
service.wait_for_deployment(show_output=True)

还有我的score.py

import joblib
import numpy as np
import os

import keras

from keras.models import load_model
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType


def init():
    global model

    model_path = Model.get_model_path('model.h5')
    model = load_model(model_path)
    model = keras.models.load_model(model_path)


# The run() method is called each time a request is made to the scoring API.
#
# Shown here are the optional input_schema and output_schema decorators
# from the inference-schema pip package. Using these decorators on your
# run() method parses and validates the incoming payload against
# the example input you provide here. This will also generate a Swagger
# API document for your web service.
@input_schema('data', NumpyParameterType(np.array([[0.1, 1.2, 2.3, 3.4, 4.5, 5.6, 6.7, 7.8, 8.9, 9.0]])))
@output_schema(NumpyParameterType(np.array([4429.929236457418])))
def run(data):

    return [123] #test

更新二：

这是端点页面的屏幕截图。 CPU 为 .1 是否正常？另外，当我在浏览器中点击 swagger url 时，出现错误："No ready replicas for service doc-classify-env-service"

更新 3 最终进入容器日志后，结果发现我的 score.py

上出现了这个错误

ModuleNotFoundError: No module named 'inference_schema'

然后我运行一个测试注释掉了 "input_schema" 和 "output_schema" 的引用并且还简化了我的 pip_packages 和 REST 端点出现！我还能够从模型中得到预测。

pip_packages=["azureml-defaults","tensorflow", "keras"])

所以我的问题是，我应该如何让 pip_packages 的评分文件使用 inference_schema 装饰器？我假设我需要包括 azureml-sdk[auotml] pip 包，但是当我这样做时，图像创建失败并且我看到几个依赖冲突。

Answer 1

尝试直接从工作区检索您的服务

ws.webservices[service_name].get_logs()

此外，我发现将图像部署为端点比推理+部署模型更容易（取决于您的用例）

my_image = Image(ws, name='test', version='26')  
service = AksWebservice.deploy_from_image(ws, "test1", my_image, deployment_config, aks_target)

Azure ML：如何访问失败模型部署的日志

Azure ML: how to access logs of a failed Model deployment

azure-machine-learning-service