AWS Sagemaker Workflow 管道使用存储在从 Codebuild 创建的工件中的代码

AWS Sagemaker Workflow pIpeline use the code stored in artifact created from Codebuild

我创建了一个 sagemaker.workflow.pipeline.Pipeline 对象,其中有几个处理步骤我试图引用 s3 文件路径而不是本地文件路径,这样它就不会上传每次管道 运行s.

文件到 s3

我的问题是,我可以修改 stepscriptprocessorpipeline 对象,以便我可以引用从 AWS Codebuild 创建的工件中的代码吗?

如果没有,我可以先使用 codebuild 将我的本地文件复制到特定的 S3 位置(到目前为止我有权限问题)然后 运行 管道?

供参考

...
step_data_ingest = ProcessingStep(
        name="DataIngestion",
        processor=sklearn_data_ingest_processor,
        inputs=[
            ProcessingInput(
                input_name="input_train_data",
                source=input_data, 
                destination="/opt/ml/processing/input/data/train"
            ),
            ProcessingInput(
                input_name="input_test_data",
                source=test_data, 
                destination="/opt/ml/processing/input/data/test"
            ),
            ProcessingInput(
                input_name="requirement_file",
                source=os.path.join(code_dir, "requirements.txt"), 
                destination="/opt/ml/processing/input/requirement"
            ),
        ],
        outputs=[
            ProcessingOutput(
                output_name="train", 
                source="/opt/ml/processing/output/train",
                destination=get_projection_s3_dir(experiment_dir, "datasets/train")
            ),
            ProcessingOutput(
                output_name="validation", 
                source="/opt/ml/processing/output/validation",
                destination=get_projection_s3_dir(experiment_dir, "datasets/validation")
            ),
            ProcessingOutput(
                output_name="test", 
                source="/opt/ml/processing/output/test",
                destination=get_projection_s3_dir(experiment_dir, "datasets/test")
            ),
            ProcessingOutput(
                output_name="sample", 
                source="/opt/ml/processing/output/sample",
                destination=get_projection_s3_dir(experiment_dir, "datasets/sample")
            ),
        ],
        code=os.path.join(code_dir, "data_ingestion.py"),
        # something like s3://some_code_dir/data_ingestion.py
        job_arguments = ["-c", country, 
                         "-v", train_val_split_percentage],
    )
...

我希望做的是:

# in processing step or processor
ProcessingStep(
    ...
    code="data_ingestion.py"
    code_location="s3://some_artifact_bucket/buildartifact/fdskz.zip"

    ...
)

# in processing step or processor
ProcessingStep(
    ...
    code="s3://some_artifact_bucket/buildartifact/fdsix/data_ingestion.py"
    ...
)

# in buildspec.yml for codebuild
aws s3 sync ./code_dir/ s3://some_code_dir/

使用ProcessingStep时,可以使用S3 URI作为代码位置,参考this查看。