如何在训练时将实验配置传递给 SagemakerTrainingOperator?

How to pass the experiment configuration to a SagemakerTrainingOperator while training?

想法:

我正在使用training_config创建dict来将训练配置传递给Tensorflow估计器,但是没有参数来传递实验配置

tf_estimator = TensorFlow(entry_point='train_model.py',
                                      source_dir= source
                                      role=sagemaker.get_execution_role(),
                                      instance_count=1,
                                      framework_version='2.3.0',
                                      instance_type=instance_type,
                                      py_version='py37',
                                      script_mode=True,
                                      enable_sagemaker_metrics = True,
                                      metric_definitions=metric_definitions,
                                      output_path=output

model_training_config = training_config(
                    estimator=tf_estimator,
                    inputs=input
                    job_name=training_jobname,
                )
    



training_task = SageMakerTrainingOperator(
                    task_id=test_id,
                    config=model_training_config,
                    aws_conn_id="airflow-sagemaker",  
                    print_log=True,
                    wait_for_completion=True,
                    check_interval=60  
                )

您可以使用 experiment_config in estimator.fit. More detailed example can be found here

我现在找到的唯一方法是使用 CreateTrainigJob API (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-RoleArn)。需要执行以下步骤:

  • 我不确定这是否适用于 Bring your own script method for E.g with a Tensorflow estimator
  • 它适用于构建您自己的容器方法
  • 使用 CreateTrainigJob API 我创建了配置,它又包括所有需要的配置,如训练、实验、algporthm 等,并将其传递给 SagemakerTrainingOperator