如何在评分脚本中获取对 AzureML Workspace Class 的引用？

Question

我的评分函数需要引用 Azure ML 注册数据集，为此我需要引用 AzureML Workspace 对象。将其包含在评分脚本的 init() 函数中时，会出现以下错误：

 "code": "ScoreInitRestart",
      "message": "Your scoring file's init() function restarts frequently. You can address the error by increasing the value of memory_gb in deployment_config."

关于调试的问题是：

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code [REDACTED] to authenticate.

如何在不在评分脚本中公开服务主体凭据的情况下解决此问题？

Answer 1

您的 score.py 是否包括 Workspace.get() 和 auth=InteractiveAuthentication 通话？您应该将其交换为 ServicePrincipalAuthentication (docs)，您最好通过环境变量将凭据传递给它。

import os
   from azureml.core.authentication import ServicePrincipalAuthentication

   svc_pr_password = os.environ.get("AZUREML_PASSWORD")

   svc_pr = ServicePrincipalAuthentication(
       tenant_id="my-tenant-id",
       service_principal_id="my-application-id",
       service_principal_password=svc_pr_password)


   ws = Workspace(
       subscription_id="my-subscription-id",
       resource_group="my-ml-rg",
       workspace_name="my-ml-workspace",
       auth=svc_pr
       )

   print("Found workspace {} at location {}".format(ws.name, ws.location))

Answer 2

您可以直接从您的运行.

获取工作区对象

from azureml.core.run import Run
ws = Run.get_context().experiment.workspace

Answer 3

我找到了在评分脚本中引用工作区的解决方法。下面是如何做到这一点的代码片段 -

我的部署脚本如下所示：

from azureml.core import Environment
from azureml.core.model import InferenceConfig

#Add python dependencies for the models
scoringenv = Environment.from_conda_specification(
                                   name = "scoringenv",
                                   file_path="config_files/scoring_env.yml"
                                    )
#Create a dictionary to set-up the env variables   
env_variables={'tenant_id':tenant_id,
                        'subscription_id':subscription_id,
                        'resource_group':resource_group,
                        'client_id':client_id,
                        'client_secret':client_secret
                        }
    
scoringenv.environment_variables=env_variables
            
# Configure the scoring environment
inference_config = InferenceConfig(
                                   entry_script='score.py',
                                   source_directory='scripts/',
                                   environment=scoringenv
                                        )

我在这里所做的是创建一个具有 python 依赖项（在 scoring_env.yml 中）的映像，并将机密字典作为环境变量传递。我将机密存储在密钥库中。您可以定义并传递本机 python 数据类型变量。

现在，在我的 score.py 中，我像这样在 init() 中引用这些环境变量 -

tenant_id = os.environ.get('tenant_id')
client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
subscription_id = os.environ.get('subscription_id')
resource_group = os.environ.get('resource_group')

获得这些变量后，您可以使用服务主体身份验证创建工作区对象，就像@Anders Swanson 在他的回复中提到的那样。

解决此问题的另一种方法可能是使用 AKS 的托管标识。我没有探索那个选项。

希望对您有所帮助！如果您找到解决此问题的更好方法，请告诉我。

谢谢！

Answer 4

我遇到了同样的挑战。正如您提到的 AML 数据集，我假设 AML Batch Endpoint 适合您的场景。批处理端点的评分脚本旨在接收文件列表作为输入。调用批处理端点时，您可以传递（除其他外）AML 数据集（考虑端点部署在 AML 工作区的上下文中）。看看 this.

如何在评分脚本中获取对 AzureML Workspace Class 的引用？

How to get reference to AzureML Workspace Class in scoring script?

azure-machine-learning-service