AWS Glue Python FileNotFoundError: [Errno 2] No such file or director
AWS Glue Python FileNotFoundError: [Errno 2] No such file or director
我正在尝试使用 AWS Glue 在跨账户 S3 存储桶之间移动文件。我在 python shell 中使用 Glue。
我有源存储桶的列表和获取对象权限。我能够列出所有文件,但是当我尝试将文件加载到目标存储桶时,出现错误“FileNotFoundError: [Errno 2] No such file or directory: 'test/f1=x/type=b/file1.parquet'
源s3上的文件有分区:
test/f1=x/type=a/file1.parquet
test/f1=x/type=a/file2.parquet
test/f1=x/type=b/file1.parquet
test/f1=x/type=b/file2.parquet
我只是想加载 f1=x 和 type=b 的文件
import pandas as pd
import boto3
client = boto3.client('s3')
bucket = 'mysourcebucketname'
folder_path = 'test/f1=x/type=b/'
def my_keys(bucket,folder_path):
keys = []
resp = client.list_objects(Bucket=bucket, Prefix=folder_path)
for obj in resp['Contents']:
keys.append(obj['Key'])
return keys
files = my_keys(bucket,folder_path)
#print(files)
for file in files:
bucketdest = 'mydestinationbucket'
new_file_name = file.split('/')[-1]
s3_file = 'destfolder1/destfolder2/'+"typeb"+new_file_name
client.upload_file(file,bucketdest,s3_file,ExtraArgs={'GrantFullControl':'id =""})
upload_file 用于从本地驱动器上传到 S3。所以你的代码正在寻找一个名为 test/f1=x/type=b/file1.parquet
的本地文件,它显然不存在,因为它在你编写的 S3 上。也许您想改为下载这些文件?
这可以通过以下方式实现:
def move_files(BUCKET, SOURCE, DESTINATION, FILENAME):
session = Session(aws_access_key_id= <Access_ID>,
aws_secret_access_key= <Secret Key>)
s3_resource = session.resource('s3')
destination_key = DESTINATION + FILENAME
source_key = SOURCE + FILENAME
try:
s3_resource.Object(BUCKET, destination_key).copy_from(
CopySource=BUCKET + '/' + source_key)
s3_resource.Object(BUCKET, source_key).delete()
except Exception as error:
print(error)
还要确保您的 IAM 用户可以访问这两个存储桶。
我正在尝试使用 AWS Glue 在跨账户 S3 存储桶之间移动文件。我在 python shell 中使用 Glue。 我有源存储桶的列表和获取对象权限。我能够列出所有文件,但是当我尝试将文件加载到目标存储桶时,出现错误“FileNotFoundError: [Errno 2] No such file or directory: 'test/f1=x/type=b/file1.parquet'
源s3上的文件有分区:
test/f1=x/type=a/file1.parquet
test/f1=x/type=a/file2.parquet
test/f1=x/type=b/file1.parquet
test/f1=x/type=b/file2.parquet
我只是想加载 f1=x 和 type=b 的文件
import pandas as pd
import boto3
client = boto3.client('s3')
bucket = 'mysourcebucketname'
folder_path = 'test/f1=x/type=b/'
def my_keys(bucket,folder_path):
keys = []
resp = client.list_objects(Bucket=bucket, Prefix=folder_path)
for obj in resp['Contents']:
keys.append(obj['Key'])
return keys
files = my_keys(bucket,folder_path)
#print(files)
for file in files:
bucketdest = 'mydestinationbucket'
new_file_name = file.split('/')[-1]
s3_file = 'destfolder1/destfolder2/'+"typeb"+new_file_name
client.upload_file(file,bucketdest,s3_file,ExtraArgs={'GrantFullControl':'id =""})
upload_file 用于从本地驱动器上传到 S3。所以你的代码正在寻找一个名为 test/f1=x/type=b/file1.parquet
的本地文件,它显然不存在,因为它在你编写的 S3 上。也许您想改为下载这些文件?
这可以通过以下方式实现:
def move_files(BUCKET, SOURCE, DESTINATION, FILENAME):
session = Session(aws_access_key_id= <Access_ID>,
aws_secret_access_key= <Secret Key>)
s3_resource = session.resource('s3')
destination_key = DESTINATION + FILENAME
source_key = SOURCE + FILENAME
try:
s3_resource.Object(BUCKET, destination_key).copy_from(
CopySource=BUCKET + '/' + source_key)
s3_resource.Object(BUCKET, source_key).delete()
except Exception as error:
print(error)
还要确保您的 IAM 用户可以访问这两个存储桶。