如何在 amazonsagemaker jupyter notebook 中设置桶的路径?

how to set path of bucket in amazonsagemaker jupyter notebook?

我是 aws 新手,如何设置我的存储桶路径和访问该存储桶的文件?

我需要用前缀更改什么吗?

import os
import boto3
import re
import copy
import time
from time import gmtime, strftime
from sagemaker import get_execution_role

role = get_execution_role()

region = boto3.Session().region_name

bucket='ltfs1' # Replace with your s3 bucket name
prefix = 'sagemaker/ltfs1' # Used as part of the path in the bucket where you store data
# bucket_path = 'https://s3-{}.amazonaws.com/{}'.format(region,bucket) # The URL to access the bucket

我正在使用上面的代码,但它显示找不到文件错误

如果您正在访问的文件在您的 s3 存储桶的根目录中,您可以这样访问该文件:

import pandas as pd

bucket='ltfs1'
data_key = 'data.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
training_data = pd.read_csv(data_location)

您需要使用 "sage.session.s3_input" 指定训练数据所在的 s3 存储桶的位置。

示例代码如下:

import sagemaker as sage
from sagemaker import get_execution_role

role = get_execution_role()
sess = sage.Session()

bucket= 'dev.xxxx.sagemaker'
prefix="EstimatorName"

s3_training_file_location = "s3://{}/csv".format(bucket) 
data_location_config = sage.session.s3_input(s3_data=s3_training_file_location, content_type="csv")

output_path="s3://{}/{}".format(bucket,prefix)


account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/CustomEstimator:latest'.format(account, region)
print(image) 
# xxxxxx.dkr.ecr.us-heast-1.amazonaws.com/CustomEstimator:latest

tree = sage.estimator.Estimator(image,
                       role, 1, 'ml.c4.2xlarge',
                       base_job_name='CustomJobName',
                       code_location=output_path,
                       output_path=output_path,
                       sagemaker_session=sess)

tree.fit(data_location_config)