将自定义 docker 与 Azure ML 结合使用

Using a custom docker with Azure ML

我正在按照指南 (https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments) 在 Azure 上使用自定义 docker 文件。我创建环境的脚本如下所示:

from azureml.core.environment import Environment

myenv = Environment(name = "myenv")
myenv.docker.enabled = True
dockerfile = r"""
FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04
RUN apt-get update && apt-get install -y libgl1-mesa-glx
RUN echo "Hello from custom container!"
"""
myenv.docker.base_image = None
myenv.docker.base_dockerfile = dockerfile

在执行时,这完全被忽略并且没有安装 libgl1。有什么想法吗?

编辑:这是我的其余代码:

est = Estimator(
    source_directory = '.',
    script_params = script_params,
    use_gpu = True,
    compute_target = 'gpu-cluster-1',
    pip_packages = ['scipy==1.1.0', 'torch==1.5.1'],
    entry_script = 'AzureEntry.py',
    )

run = exp.submit(config = est)
run.wait_for_completion(show_output=True)

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

完全可以理解您为何苦苦挣扎 -- others have also expressed a need for more information

  1. 也许base_dockerfile需要是一个文本文件(里面有内容)而不是字符串?我会请环境 PM 更具体地了解它是如何工作的
  2. 另一种选择是利用 Azure 容器实例 (ACI)。启动 Azure ML 工作区时会自动创建 ACI。有关详细信息,请参阅 this GitHub issue

有关在环境中使用 Docker 的详细信息,请参阅文章`启用 Dockerhttps://docs.microsoft.com/azure/machine-learning/how-to-use-environments#enable-docker
以下示例显示如何将 docker 个步骤作为字符串加载。

   from azureml.core import Environment
   myenv = Environment(name="myenv")

   # Creates the environment inside a Docker container.
   myenv.docker.enabled = True

   # Specify docker steps as a string.
   dockerfile = r'''
   FROM mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04
   RUN echo "Hello from custom container!"
   '''

   # Alternatively, load from a file.
   #with open("dockerfiles/Dockerfile", "r") as f:
   #    dockerfile=f.read()

   myenv.docker.base_dockerfile = dockerfile

安装库没有问题。首先,请将您的 dockerfile 内容转储到一个文件中,以便于维护和阅读 ;)

e = Environment("custom")
e.docker.base_dockerfile = "path/to/your/dockerfile"

将文件内容设置为字符串属性。

e.register(ws).build(ws).wait_for_completion()

步骤 2/16 将有您的 apt 更新和 libgl1 安装

请注意,这应该适用于 >=1.7 SDK

我认为您使用的是估算器。估算器会创建自己的环境,除非您设置 environment_definition 参数,我在您的代码片段中看不到该参数。我在看 https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py

还没有尝试过,但我认为您可以通过将代码更改为以下内容来解决此问题:

est = Estimator(
    source_directory = '.',
    script_params = script_params,
    use_gpu = True,
    compute_target = 'gpu-cluster-1',
    pip_packages = ['scipy==1.1.0', 'torch==1.5.1'],
    entry_script = 'AzureEntry.py',
    environment_definition = myenv
    )

run = exp.submit(config = est)
run.wait_for_completion(show_output=True)

您可能还需要将 use_gpu 设置移动到环境定义中,因为我在上面链接的 SDK 页面上说环境将优先于此参数和其他几个估算器参数。

这应该有效:

from azureml.core import Workspace
from azureml.core.environment import Environment
from azureml.train.estimator import Estimator
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Experiment

ws = Workspace (...)
exp = Experiment(ws, 'test-so-exp')

myenv = Environment(name = "myenv")
myenv.docker.enabled = True
dockerfile = r"""
FROM mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04
RUN apt-get update && apt-get install -y libgl1-mesa-glx
RUN echo "Hello from custom container!"
"""
myenv.docker.base_image = None
myenv.docker.base_dockerfile = dockerfile

## You need to instead put your packages in the Environment definition instead... 
## see below for some changes too

myenv.python.conda_dependencies = CondaDependencies.create(pip_packages = ['scipy==1.1.0', 'torch==1.5.1'])

最后,您可以稍微不同地构建估算器:

est = Estimator(
    source_directory = '.',
#     script_params = script_params,
#     use_gpu = True,
    compute_target = 'gpu-cluster-1',
#     pip_packages = ['scipy==1.1.0', 'torch==1.5.1'],
    entry_script = 'AzureEntry.py',
    environment_definition=myenv
    )

并提交:

run = exp.submit(config = est)
run.wait_for_completion(show_output=True)

如果可行,请告诉我们。