如何在训练步骤中结合 AzureML SDK 中的管道和超参数
How to combine pipeline and hyperparameter in AzureML SDK in the training step
缩写形式:
我想弄清楚如何在管道中的 训练步骤 (即 train_step = PythonScriptStep(...))中 运行 超参数,我我不确定我应该把“config=hyperdrive”放在哪里
长格式:
一般:
# Register the environment
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')
# Create a new runconfig object for the pipeline
run_config = RunConfiguration()
# Use the compute you created above.
run_config.target = ComputerTarget_Crea
# Assign the environment to the run configuration
run_config.environment = registered_env
超参数:
script_config = ScriptRunConfig(source_directory=experiment_folder,
script='diabetes_training.py',
# Add non-hyperparameter arguments -in this case, the training dataset
arguments = ['--input-data', diabetes_ds.as_named_input('training_data')],
environment=sklearn_env,
compute_target = training_cluster)
# Sample a range of parameter values
params = GridParameterSampling(
{
# Hyperdrive will try 6 combinations, adding these as script arguments
'--learning_rate': choice(0.01, 0.1, 1.0),
'--n_estimators' : choice(10, 100)
}
)
# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(run_config=script_config,
hyperparameter_sampling=params,
policy=None, # No early stopping policy
primary_metric_name='AUC', # Find the highest AUC metric
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=6, # Restict the experiment to 6 iterations
max_concurrent_runs=2) # Run up to 2 iterations in parallel
# Run the experiment if I only want to run hyperparam alone without the pipeline
#experiment = Experiment(workspace=ws, name='mslearn-diabetes-hyperdrive')
#run = experiment.submit(**config=hyperdrive**)
流水线:
prep_step = PythonScriptStep(name = "Prepare Data",
source_directory = experiment_folder,
script_name = "prep_diabetes.py",
arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
'--prepped-data', prepped_data_folder],
outputs=[prepped_data_folder],
compute_target = ComputerTarget_Crea,
runconfig = run_config,
allow_reuse = True)
# Step 2, run the training script
train_step = PythonScriptStep(name = "Train and Register Model",
source_directory = experiment_folder,
script_name = "train_diabetes.py",
arguments = ['--training-folder', prepped_data_folder],
inputs=[prepped_data_folder],
compute_target = ComputerTarget_Crea,
runconfig = run_config,
allow_reuse = True)
# Construct the pipeline
pipeline_steps = [prep_step, train_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")
# Create an experiment and run the pipeline
**#How do I need to change these below lines to use hyperdrive????**
experiment = Experiment(workspace=ws, name = 'mslearn-diabetes-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
不确定我需要将 config=hyperdrive 放在管道部分的什么位置?
缩写形式: 我想弄清楚如何在管道中的 训练步骤 (即 train_step = PythonScriptStep(...))中 运行 超参数,我我不确定我应该把“config=hyperdrive”放在哪里
长格式:
一般:
# Register the environment
diabetes_env.register(workspace=ws)
registered_env = Environment.get(ws, 'diabetes-pipeline-env')
# Create a new runconfig object for the pipeline
run_config = RunConfiguration()
# Use the compute you created above.
run_config.target = ComputerTarget_Crea
# Assign the environment to the run configuration
run_config.environment = registered_env
超参数:
script_config = ScriptRunConfig(source_directory=experiment_folder,
script='diabetes_training.py',
# Add non-hyperparameter arguments -in this case, the training dataset
arguments = ['--input-data', diabetes_ds.as_named_input('training_data')],
environment=sklearn_env,
compute_target = training_cluster)
# Sample a range of parameter values
params = GridParameterSampling(
{
# Hyperdrive will try 6 combinations, adding these as script arguments
'--learning_rate': choice(0.01, 0.1, 1.0),
'--n_estimators' : choice(10, 100)
}
)
# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(run_config=script_config,
hyperparameter_sampling=params,
policy=None, # No early stopping policy
primary_metric_name='AUC', # Find the highest AUC metric
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=6, # Restict the experiment to 6 iterations
max_concurrent_runs=2) # Run up to 2 iterations in parallel
# Run the experiment if I only want to run hyperparam alone without the pipeline
#experiment = Experiment(workspace=ws, name='mslearn-diabetes-hyperdrive')
#run = experiment.submit(**config=hyperdrive**)
流水线:
prep_step = PythonScriptStep(name = "Prepare Data",
source_directory = experiment_folder,
script_name = "prep_diabetes.py",
arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
'--prepped-data', prepped_data_folder],
outputs=[prepped_data_folder],
compute_target = ComputerTarget_Crea,
runconfig = run_config,
allow_reuse = True)
# Step 2, run the training script
train_step = PythonScriptStep(name = "Train and Register Model",
source_directory = experiment_folder,
script_name = "train_diabetes.py",
arguments = ['--training-folder', prepped_data_folder],
inputs=[prepped_data_folder],
compute_target = ComputerTarget_Crea,
runconfig = run_config,
allow_reuse = True)
# Construct the pipeline
pipeline_steps = [prep_step, train_step]
pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")
# Create an experiment and run the pipeline
**#How do I need to change these below lines to use hyperdrive????**
experiment = Experiment(workspace=ws, name = 'mslearn-diabetes-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
不确定我需要将 config=hyperdrive 放在管道部分的什么位置?