如何在训练步骤中结合 AzureML SDK 中的管道和超参数

How to combine pipeline and hyperparameter in AzureML SDK in the training step

缩写形式: 我想弄清楚如何在管道中的 训练步骤 (即 train_step = PythonScriptStep(...))中 运行 超参数,我我不确定我应该把“config=hyperdrive”放在哪里

长格式:

一般:

# Register the environment 
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')

# Create a new runconfig object for the pipeline
run_config = RunConfiguration()

# Use the compute you created above. 
run_config.target = ComputerTarget_Crea

# Assign the environment to the run configuration
run_config.environment = registered_env

超参数:

script_config = ScriptRunConfig(source_directory=experiment_folder,
                                script='diabetes_training.py',
                                # Add non-hyperparameter arguments -in this case, the training dataset
                                arguments = ['--input-data', diabetes_ds.as_named_input('training_data')],
                                environment=sklearn_env,
                                compute_target = training_cluster)

# Sample a range of parameter values
params = GridParameterSampling(
    {
        # Hyperdrive will try 6 combinations, adding these as script arguments
        '--learning_rate': choice(0.01, 0.1, 1.0),
        '--n_estimators' : choice(10, 100)
    }
)

# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(run_config=script_config, 
                          hyperparameter_sampling=params, 
                          policy=None, # No early stopping policy
                          primary_metric_name='AUC', # Find the highest AUC metric
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                          max_total_runs=6, # Restict the experiment to 6 iterations
                          max_concurrent_runs=2) # Run up to 2 iterations in parallel

# Run the experiment if I only want to run hyperparam alone without the pipeline
#experiment = Experiment(workspace=ws, name='mslearn-diabetes-hyperdrive')
#run = experiment.submit(**config=hyperdrive**)

流水线:

prep_step = PythonScriptStep(name = "Prepare Data",
                                source_directory = experiment_folder,
                                script_name = "prep_diabetes.py",
                                arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
                                             '--prepped-data', prepped_data_folder],
                                outputs=[prepped_data_folder],
                                compute_target = ComputerTarget_Crea,
                                runconfig = run_config,
                                allow_reuse = True)

# Step 2, run the training script
train_step = PythonScriptStep(name = "Train and Register Model",
                                source_directory = experiment_folder,
                                script_name = "train_diabetes.py",
                                arguments = ['--training-folder', prepped_data_folder],
                                inputs=[prepped_data_folder],
                                compute_target = ComputerTarget_Crea,
                                runconfig = run_config,
                                allow_reuse = True)
# Construct the pipeline
pipeline_steps = [prep_step, train_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")

# Create an experiment and run the pipeline
**#How do I need to change these below lines to use hyperdrive????**
experiment = Experiment(workspace=ws, name = 'mslearn-diabetes-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)

不确定我需要将 config=hyperdrive 放在管道部分的什么位置?

以下是将超参数与 AML 管道相结合的方法:https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.hyperdrivestep?view=azure-ml-py

或者,这是一个示例笔记本:https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb