AWS 将文件上传到错误的存储桶中
AWS uploading file into wrong bucket
我正在使用 AWS Sagemaker 并尝试将数据文件夹从 Sagemaker 上传到 S3。我正在尝试做的是将我的数据上传到 s3_train_data 目录(该目录存在于 S3 中)。但是,它不会将其上传到该存储桶中,而是上传到已创建的默认存储桶中,然后创建一个包含 S3_train_data 变量的新文件夹目录。
要在目录中输入的代码
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
代码中的问题还是我创建笔记本的更多方式?
你可以看看Sample notebooks,如何上传数据S3 bucket
有很多方法。我只是给你提示来回答。
而且您忘记创建 boto3 会话来访问 S3 存储桶
这是实现方法之一。
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
另一种方法。
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Youtube link : https://www.youtube.com/watch?v=-YiHPIGyFGo - 如何拉取 S3 存储桶中的数据。
我正在使用 AWS Sagemaker 并尝试将数据文件夹从 Sagemaker 上传到 S3。我正在尝试做的是将我的数据上传到 s3_train_data 目录(该目录存在于 S3 中)。但是,它不会将其上传到该存储桶中,而是上传到已创建的默认存储桶中,然后创建一个包含 S3_train_data 变量的新文件夹目录。
要在目录中输入的代码
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
bucket = <bucket name>
prefix = <folders1/folders2>
key = <input>
s3_train_data = 's3://{}/{}/{}/'.format(bucket, prefix, key)
#path 'data' is the folder in the Jupyter Instance, contains all the training data
inputs = sagemaker_session.upload_data(path= 'data', key_prefix= s3_train_data)
代码中的问题还是我创建笔记本的更多方式?
你可以看看Sample notebooks,如何上传数据S3 bucket 有很多方法。我只是给你提示来回答。 而且您忘记创建 boto3 会话来访问 S3 存储桶
这是实现方法之一。
import os
import urllib.request
import boto3
def download(url):
filename = url.split("/")[-1]
if not os.path.exists(filename):
urllib.request.urlretrieve(url, filename)
def upload_to_s3(channel, file):
s3 = boto3.resource('s3')
data = open(file, "rb")
key = channel + '/' + file
s3.Bucket(bucket).put_object(Key=key, Body=data)
# caltech-256
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
upload_to_s3('train', 'caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
upload_to_s3('validation', 'caltech-256-60-val.rec')
另一种方法。
bucket = '<your_s3_bucket_name_here>'# enter your s3 bucket where you will copy data and model artifacts
prefix = 'sagemaker/breast_cancer_prediction' # place to upload training files within the bucket
# do some processing then prepare to push the data.
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
Youtube link : https://www.youtube.com/watch?v=-YiHPIGyFGo - 如何拉取 S3 存储桶中的数据。