限制 Azure 机器学习管道使用的节点数

Question

我已经编写了一个管道，我想运行在 Azure 机器学习的远程计算集群上。我的目标是处理大量历史数据，为此我需要运行大量输入参数组合的管道。

有没有办法限制管道在集群上使用的节点数？默认情况下，它将使用集群可用的所有节点，我想对其进行限制，以便它仅使用预定义的最大值。这允许我将集群的其余部分留给其他用户免费使用。

我当前启动管道的代码如下所示：

# Setup the pipeline
steps = [data_import_step] # Contains PythonScriptStep
pipeline = Pipeline(workspace=ws, steps=steps)
pipeline.validate()

# Big long list of historical dates that I want to process data for
dts = pd.date_range('2019-01-01', '2020-01-01', freq='6H', closed='left')
# Submit the pipeline job
for dt in dts:
    pipeline_run = Experiment(ws, 'my-pipeline-run').submit(
        pipeline,
        pipeline_parameters={
            'import_datetime': dt.strftime('%Y-%m-%dT%H:00'),
        }
    )

Answer 1

对我来说，Azure ML 的杀手级功能是不必像这样担心负载平衡。我们的团队有一个计算目标 max_nodes=100 每个功能分支，我们有 Hyperdrive 个管道，每个管道运行 130 次。

我们可以连续提交多个 PipelineRuns，编排器负责排队、提交所有运行的繁重工作，以便 PipelineRuns 按顺序执行 I提交它们，并且集群永远不会超载。这对我们来说 99% 的时间都没有问题。

如果您要查找的是 PipelineRun 并行执行，那么您应该查看 ParallelRunStep。

另一种选择是隔离您的计算。每个工作区最多可以有 200 ComputeTargets。两个 50 节点 ComputeTarget 的成本与一个 100 节点 ComputeTarget 的成本相同。

在我们的团队中，我们使用 pygit2 为每个功能分支创建 ComputeTarget，因此，作为数据科学家，我们可以确信我们不会踩到我们的同事' 脚趾.

限制 Azure 机器学习管道使用的节点数

Restrict the number of nodes used by an Azure Machine Learning pipeine

python

azure

azure-machine-learning-studio

azure-machine-learning-service