AML - Web 服务超时错误

AML - Web service TimeoutError

我们创建了一个网络服务端点并使用以下代码和 POSTMAN 对其进行了测试。

我们将该服务部署到与 AML 资源相同的资源组和订阅中的 AKS。

更新:附加的 AKS 具有自定义网络配置并拒绝外部连接。

import numpy
import os, json, datetime, sys
from operator import attrgetter
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.image import Image
from azureml.core.webservice import Webservice
from azureml.core.authentication import AzureCliAuthentication

cli_auth = AzureCliAuthentication()
# Get workspace
ws = Workspace.from_config(auth=cli_auth)

# Get the AKS Details
try:
    with open("../aml_config/aks_webservice.json") as f:
        config = json.load(f)
except:
    print("No new model, thus no deployment on AKS")
    # raise Exception('No new model to register as production model perform better')
    sys.exit(0)

service_name = config["aks_service_name"]
# Get the hosted web service
service = Webservice(workspace=ws, name=service_name)

# Input for Model with all features
input_j = [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]
print(input_j)
test_sample = json.dumps({"data": input_j})
test_sample = bytes(test_sample, encoding="utf8")
try:
    prediction = service.run(input_data=test_sample)
    print(prediction)
except Exception as e:
    result = str(e)
    print(result)
    raise Exception("AKS service is not working as expected")

在 AML Studio 中,部署状态为 "Healthy"。

测试时出现以下错误:

Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'

部署 AKS Web 服务后立即登录 here

在 运行 测试脚本 here 之后登录。

我们怎样才能知道是什么导致了这个问题并解决它?

你试过吗service.get_logs()。也请先尝试本地部署。 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-local-container-notebook-vm

我不确定 WebserviceAKSWebservice 之间有什么区别,但请尝试一下 AKS 变体 link。我还会尝试通过 ACI 部署并验证您的依赖项和评分脚本来确定这是否是 AKS 问题。

我们检查了 AKS 网络配置,发现它有一个 Azure CNI 配置文件。

为了测试网络服务,我们需要从创建的虚拟网络内部进行测试。 效果不错!