带有 AML 数据存储参考的 ScriptRunConfig

Question

尝试运行 ScriptRunConfig 时，使用：

src = ScriptRunConfig(source_directory=project_folder, 
                      script='train.py', 
                      arguments=['--input-data-dir', ds.as_mount(),
                                 '--reg', '0.99'],
                      run_config=run_config) 
run = experiment.submit(config=src)

当我提交作业时，它不起作用并且中断了：

... lots of things... and then
TypeError: Object of type 'DataReference' is not JSON serializable

但是，如果我运行使用 Estimator，它就可以工作。其中一个区别是 ScriptRunConfig 我们使用的是参数列表，而另一个是字典。

感谢指点！

Answer 1

能够在 ScriptRunConfig 中使用 DataReference 比仅使用 ds.as_mount() 更复杂一些。您需要将其转换为 arguments 中的字符串，然后使用从 ds 创建的 DataReferenceConfiguration 更新 RunConfiguration 的 data_references 部分。请 see here 获取有关如何执行此操作的示例笔记本。

如果您只是从输入位置读取而不对其进行任何写入，请查看 Dataset. It allows you to do exactly what you are doing without doing anything extra. Here is an example notebook 显示的实际情况。

下面是笔记本的简版

from azureml.core import Dataset

# more imports and code

ds = Datastore(workspace, 'mydatastore')
dataset = Dataset.File.from_files(path=(ds, 'path/to/input-data/within-datastore'))

src = ScriptRunConfig(source_directory=project_folder, 
                      script='train.py', 
                      arguments=['--input-data-dir', dataset.as_named_input('input').as_mount(),
                                 '--reg', '0.99'],
                      run_config=run_config) 
run = experiment.submit(config=src)

Answer 2

你可以在官方文档中看到这个link how-to-migrate-from-estimators-to-scriptrunconfig

在ScriptRunConfig中使用DataReference的核心代码是

# if you want to pass a DataReference object, such as the below:
datastore = ws.get_default_datastore()
data_ref = datastore.path('./foo').as_mount()

src = ScriptRunConfig(source_directory='.',
                      script='train.py',
                      arguments=['--data-folder', str(data_ref)], # cast the DataReference object to str
                      compute_target=compute_target,
                      environment=pytorch_env)
src.run_config.data_references = {data_ref.data_reference_name: data_ref.to_config()} # set a dict of the DataReference(s) you want to the `data_references` attribute of the ScriptRunConfig's underlying RunConfiguration object.

带有 AML 数据存储参考的 ScriptRunConfig

ScriptRunConfig with datastore reference on AML

azure-machine-learning-service