AWS Glue Python FileNotFoundError: [Errno 2] No such file or director

AWS Glue Python FileNotFoundError: [Errno 2] No such file or director

我正在尝试使用 AWS Glue 在跨账户 S3 存储桶之间移动文件。我在 python shell 中使用 Glue。 我有源存储桶的列表和获取对象权限。我能够列出所有文件,但是当我尝试将文件加载到目标存储桶时,出现错误“FileNotFoundError: [Errno 2] No such file or directory: 'test/f1=x/type=b/file1.parquet'

源s3上的文件有分区:

test/f1=x/type=a/file1.parquet
test/f1=x/type=a/file2.parquet
test/f1=x/type=b/file1.parquet
test/f1=x/type=b/file2.parquet

我只是想加载 f1=x 和 type=b 的文件

import pandas as pd 
import boto3
         
client = boto3.client('s3')
bucket = 'mysourcebucketname' 
folder_path = 'test/f1=x/type=b/'
       
def my_keys(bucket,folder_path):
    keys = []
    resp = client.list_objects(Bucket=bucket, Prefix=folder_path)
    for obj in resp['Contents']:
        keys.append(obj['Key'])
    return keys
           
files = my_keys(bucket,folder_path)
#print(files)
     
for file in files:
    bucketdest = 'mydestinationbucket'
    new_file_name = file.split('/')[-1]
    s3_file = 'destfolder1/destfolder2/'+"typeb"+new_file_name
    client.upload_file(file,bucketdest,s3_file,ExtraArgs={'GrantFullControl':'id =""})

upload_file 用于从本地驱动器上传到 S3。所以你的代码正在寻找一个名为 test/f1=x/type=b/file1.parquet 的本地文件,它显然不存在,因为它在你编写的 S3 上。也许您想改为下载这些文件?

这可以通过以下方式实现:

def move_files(BUCKET, SOURCE, DESTINATION, FILENAME):
session = Session(aws_access_key_id= <Access_ID>,
                  aws_secret_access_key= <Secret Key>)
s3_resource = session.resource('s3')
destination_key = DESTINATION + FILENAME
source_key = SOURCE + FILENAME
try:
    s3_resource.Object(BUCKET, destination_key).copy_from(
        CopySource=BUCKET + '/' + source_key)
    s3_resource.Object(BUCKET, source_key).delete()
except Exception as error:
    print(error)

还要确保您的 IAM 用户可以访问这两个存储桶。