AWS Sagemaker Workflow 管道使用存储在从 Codebuild 创建的工件中的代码
AWS Sagemaker Workflow pIpeline use the code stored in artifact created from Codebuild
我创建了一个 sagemaker.workflow.pipeline.Pipeline
对象,其中有几个处理步骤我试图引用 s3 文件路径而不是本地文件路径,这样它就不会上传每次管道 运行s.
文件到 s3
我的问题是,我可以修改 step
或 scriptprocessor
或 pipeline
对象,以便我可以引用从 AWS Codebuild 创建的工件中的代码吗?
如果没有,我可以先使用 codebuild 将我的本地文件复制到特定的 S3 位置(到目前为止我有权限问题)然后 运行 管道?
供参考
...
step_data_ingest = ProcessingStep(
name="DataIngestion",
processor=sklearn_data_ingest_processor,
inputs=[
ProcessingInput(
input_name="input_train_data",
source=input_data,
destination="/opt/ml/processing/input/data/train"
),
ProcessingInput(
input_name="input_test_data",
source=test_data,
destination="/opt/ml/processing/input/data/test"
),
ProcessingInput(
input_name="requirement_file",
source=os.path.join(code_dir, "requirements.txt"),
destination="/opt/ml/processing/input/requirement"
),
],
outputs=[
ProcessingOutput(
output_name="train",
source="/opt/ml/processing/output/train",
destination=get_projection_s3_dir(experiment_dir, "datasets/train")
),
ProcessingOutput(
output_name="validation",
source="/opt/ml/processing/output/validation",
destination=get_projection_s3_dir(experiment_dir, "datasets/validation")
),
ProcessingOutput(
output_name="test",
source="/opt/ml/processing/output/test",
destination=get_projection_s3_dir(experiment_dir, "datasets/test")
),
ProcessingOutput(
output_name="sample",
source="/opt/ml/processing/output/sample",
destination=get_projection_s3_dir(experiment_dir, "datasets/sample")
),
],
code=os.path.join(code_dir, "data_ingestion.py"),
# something like s3://some_code_dir/data_ingestion.py
job_arguments = ["-c", country,
"-v", train_val_split_percentage],
)
...
我希望做的是:
# in processing step or processor
ProcessingStep(
...
code="data_ingestion.py"
code_location="s3://some_artifact_bucket/buildartifact/fdskz.zip"
...
)
或
# in processing step or processor
ProcessingStep(
...
code="s3://some_artifact_bucket/buildartifact/fdsix/data_ingestion.py"
...
)
或
# in buildspec.yml for codebuild
aws s3 sync ./code_dir/ s3://some_code_dir/
使用ProcessingStep
时,可以使用S3 URI
作为代码位置,参考this查看。
我创建了一个 sagemaker.workflow.pipeline.Pipeline
对象,其中有几个处理步骤我试图引用 s3 文件路径而不是本地文件路径,这样它就不会上传每次管道 运行s.
我的问题是,我可以修改 step
或 scriptprocessor
或 pipeline
对象,以便我可以引用从 AWS Codebuild 创建的工件中的代码吗?
如果没有,我可以先使用 codebuild 将我的本地文件复制到特定的 S3 位置(到目前为止我有权限问题)然后 运行 管道?
供参考
...
step_data_ingest = ProcessingStep(
name="DataIngestion",
processor=sklearn_data_ingest_processor,
inputs=[
ProcessingInput(
input_name="input_train_data",
source=input_data,
destination="/opt/ml/processing/input/data/train"
),
ProcessingInput(
input_name="input_test_data",
source=test_data,
destination="/opt/ml/processing/input/data/test"
),
ProcessingInput(
input_name="requirement_file",
source=os.path.join(code_dir, "requirements.txt"),
destination="/opt/ml/processing/input/requirement"
),
],
outputs=[
ProcessingOutput(
output_name="train",
source="/opt/ml/processing/output/train",
destination=get_projection_s3_dir(experiment_dir, "datasets/train")
),
ProcessingOutput(
output_name="validation",
source="/opt/ml/processing/output/validation",
destination=get_projection_s3_dir(experiment_dir, "datasets/validation")
),
ProcessingOutput(
output_name="test",
source="/opt/ml/processing/output/test",
destination=get_projection_s3_dir(experiment_dir, "datasets/test")
),
ProcessingOutput(
output_name="sample",
source="/opt/ml/processing/output/sample",
destination=get_projection_s3_dir(experiment_dir, "datasets/sample")
),
],
code=os.path.join(code_dir, "data_ingestion.py"),
# something like s3://some_code_dir/data_ingestion.py
job_arguments = ["-c", country,
"-v", train_val_split_percentage],
)
...
我希望做的是:
# in processing step or processor
ProcessingStep(
...
code="data_ingestion.py"
code_location="s3://some_artifact_bucket/buildartifact/fdskz.zip"
...
)
或
# in processing step or processor
ProcessingStep(
...
code="s3://some_artifact_bucket/buildartifact/fdsix/data_ingestion.py"
...
)
或
# in buildspec.yml for codebuild
aws s3 sync ./code_dir/ s3://some_code_dir/
使用ProcessingStep
时,可以使用S3 URI
作为代码位置,参考this查看。