列出 S3 对象直到第一级

Question

我正在尝试像这样列出 s3 对象：

for key in s3_client.list_objects(Bucket='bucketname')['Contents']:
    logger.debug(key['Key'])

我只想打印存在于第一层的文件夹名称或文件名称。

例如，如果我的桶有这个：

bucketname
     folder1
     folder2
        text1.txt
        text2.txt
    catallog.json

我只想打印 folder1、folder2 和 catalog.json。我不想包括 text1.txt 等

但是，我当前的解决方案还会打印存储桶名称中文件夹中存在的文件名。

我该如何修改？我看到有一个 'Prefix' 参数，但不确定如何使用它。

Answer 1

您可以在“/”上拆分按键，只保留第一级：

level1 = set()  #Using a set removes duplicates automatically 
for key in s3_client.list_objects(Bucket='bucketname')['Contents']:
        level1.add(key["Key"].split("/")[0])  #Here we only keep the first level of the key 

#then print your level1 set
logger.debug(level1)

/!\ 警告

list_object方法已修改，建议根据AWS S3 documentation

list_objects_v2

此方法仅 returns 部分或全部（最多 1,000 个）键。如果你想确保你得到所有的密钥，你需要使用函数返回的continuation_token：

level1 = set()
continuation_token = ""
while continuation_token is not None:
    extra_params = {"ContinuationToken": continuation_token} if continuation_token else {}
    response = s3_client.list_objects_v2(Bucket="bucketname", Prefix="", **extra_params)
    continuation_token = response.get("NextContinuationToken")
    for obj in response.get("Contents", []):
        level1.add(obj.get("Key").split("/")[0])

logger.debug(level1)

Answer 2

您使用 Delimiter 选项，例如：

import boto3

s3 = boto3.client("s3")
BUCKET = "bucketname"

rsp = s3.list_objects_v2(Bucket=BUCKET, Delimiter="/")

objects = [obj["Key"] for obj in rsp["Contents"]]
folders = [fld["Prefix"] for fld in rsp["CommonPrefixes"]]

for obj in objects:
    print("Object:", obj)

for folder in folders:
    print("Folder:", folder)

结果：

Object: catalog.json
Folder: folder1/
Folder: folder2/

请注意，如果您 top-level 上有大量密钥（超过 1000 个），那么您将需要 paginate 您的请求。

此外，请注意 list_objects is essentially deprecated and you should use list_objects_v2。

列出 S3 对象直到第一级

list S3 objects till only first level

python

amazon-s3

boto

amazon-web-services