如何使用 AWS s3 或 AWS s3api 递归更改文件夹权限

How to change permission recursively to folder with AWS s3 or AWS s3api

我正在尝试向 s3 中的现有帐户授予权限。

存储桶归帐户所有,但数据是从另一个帐户的存储桶复制的。

当我尝试使用以下命令授予权限时:

aws s3api put-object-acl --bucket <bucket_name> --key <folder_name> --profile <original_account_profile> --grant-full-control emailaddress=<destination_account_email>

我收到错误:

An error occurred (NoSuchKey) when calling the PutObjectAcl operation: The specified key does not exist.

而如果我对单个文件执行此操作,则命令成功。

如何让它适用于整个文件夹?

您将需要 运行 为每个对象分别执行命令。

您可以使用以下方式缩短流程:

aws s3 cp --acl bucket-owner-full-control --metadata Key=Value --profile <original_account_profile> s3://bucket/path s3://bucket/path

也就是说,您将文件复制到它们自己,但添加了向存储桶拥有者授予权限的 ACL。

如果有子目录,则添加--recursive.

这只能通过使用管道来实现。尝试 -

aws s3 ls s3://bucket/path/ --recursive | awk '{cmd="aws s3api put-object-acl --acl bucket-owner-full-control --bucket bucket --key "; system(cmd)}'

使用python递归设置权限

#!/usr/bin/env python
import boto3
import sys

client = boto3.client('s3')
BUCKET='enter-bucket-name'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            print(obj['Key'])
            set_acl(obj['Key'])
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue

    print "failures :", failures

def set_acl(key):
  client.put_object_acl(     
    GrantFullControl="id=%s" % get_account_canonical_id,
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

python这样的代码效率更高,否则耗时会长很多。

import boto3
import sys

client = boto3.client('s3')
BUCKET='mybucket'

def process_s3_objects(prefix):
    """Get a list of all keys in an S3 bucket."""
    kwargs = {'Bucket': BUCKET, 'Prefix': prefix}
    failures = []
    while_true = True
    while while_true:
      resp = client.list_objects_v2(**kwargs)
      for obj in resp['Contents']:
        try:
            set_acl(obj['Key'])
        except KeyError:
            while_true = False
        except Exception:
            failures.append(obj["Key"])
            continue
      kwargs['ContinuationToken'] = resp['NextContinuationToken']
    print ("failures :"+ failures)

def set_acl(key):
  print(key)
  client.put_object_acl(
    ACL='bucket-owner-full-control',
    Bucket=BUCKET,
    Key=key
)

def get_account_canonical_id():
  return client.list_buckets()["Owner"]["ID"]


process_s3_objects(sys.argv[1])

这是我 powershell 唯一的解决方案。

aws s3 ls s3://BUCKET/ --recursive | %{ "aws s3api put-object-acl --bucket BUCKET --key "+$_.ToString().substring(30)+" --acl bucket-owner-full-control" }

其他答案没问题,但最快的方法是使用带有选项 --metadata-directive REPLACEaws s3 cp 命令,如下所示:

aws s3 cp --recursive --acl bucket-owner-full-control s3://bucket/folder s3://bucket/folder --metadata-directive REPLACE

这给出了 50Mib/s 和 80Mib/s 之间的速度。

来自 John R 的评论的答案,建议使用 'dummy' 选项,例如 --storage-class STANDARD。虽然这有效,但只给了我 5Mib/s 和 11mb/s 之间的复制速度。

尝试此操作的灵感来自 AWS 关于该主题的支持文章:https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-change-anonymous-ownership/

注意:如果您遇到某些对象的“访问被拒绝”,这可能是因为您正在为拥有存储桶的帐户使用 AWS 凭据,而您需要为从中复制文件的帐户使用凭据.

我使用这个 Linux Bash shell oneliner 递归地更改 ACL:

aws s3 ls s3://bucket --recursive | cut -c 32- | xargs -n 1 -d '\n' -- aws s3api put-object-acl --acl public-read --bucket bukcet --key

即使文件名包含 () 个字符也能正常工作。

我在获取相当大的存储桶中的日志对象所有权时遇到了类似的问题。 对象总数 - 3,290,956 总大小 1.4 TB。

我能找到的解决方案对于那么多的对象来说太慢了。我最终编写了一些能够比

快几倍完成工作的代码

aws s3 cp

您将需要安装要求:

pip install pathos boto3 click

#!/usr/bin/env python3
import logging
import os
import sys
import boto3
import botocore
import click
from time import time
from botocore.config import Config
from pathos.pools import ThreadPool as Pool

logger = logging.getLogger(__name__)

streamformater = logging.Formatter("[*] %(levelname)s: %(asctime)s: %(message)s")
logstreamhandler = logging.StreamHandler()
logstreamhandler.setFormatter(streamformater)


def _set_log_level(ctx, param, value):
    if value:
        ctx.ensure_object(dict)
        ctx.obj["log_level"] = value
        logger.setLevel(value)
        if value <= 20:
            logger.info(f"Logger set to {logging.getLevelName(logger.getEffectiveLevel())}")
    return value


@click.group(chain=False)
@click.version_option(version='0.1.0')
@click.pass_context
def cli(ctx):
    """
        Take object ownership of S3 bucket objects.
    """
    ctx.ensure_object(dict)
    ctx.obj["aws_config"] = Config(
        retries={
            'max_attempts': 10,
            'mode': 'standard'
        }
    )


@cli.command("own")
@click.argument("bucket", type=click.STRING)
@click.argument("prefix", type=click.STRING, default="/")
@click.option("--profile", type=click.STRING, default="default", envvar="AWS_DEFAULT_PROFILE", help="Configuration profile from ~/.aws/{credentials,config}")
@click.option("--region", type=click.STRING, default="us-east-1", envvar="AWS_DEFAULT_REGION", help="AWS region")
@click.option("--threads", "-t", type=click.INT, default=40, help="Threads to use")
@click.option("--loglevel", "log_level", hidden=True, flag_value=logging.INFO, callback=_set_log_level, expose_value=False, is_eager=True, default=True)
@click.option("--verbose", "-v", "log_level", flag_value=logging.DEBUG, callback=_set_log_level, expose_value=False, is_eager=True, help="Increase log_level")
@click.pass_context
def command_own(ctx, *args, **kwargs):
    ctx.obj.update(kwargs)
    profile_name = ctx.obj.get("profile")
    region = ctx.obj.get("region")
    bucket = ctx.obj.get("bucket")
    prefix = ctx.obj.get("prefix").lstrip("/")
    threads = ctx.obj.get("threads")
    pool = Pool(nodes=threads)
    logger.addHandler(logstreamhandler)
    logger.info(f"Getting ownership of all objects in s3://{bucket}/{prefix}")
    start = time()

    try:
        SESSION: boto3.Session = boto3.session.Session(profile_name=profile_name)
    except botocore.exceptions.ProfileNotFound as e:
        logger.warning(f"Profile {profile_name} was not found.")
        logger.warning(f"Falling back to environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN")
        AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID", "")
        AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY", "")
        AWS_SESSION_TOKEN = os.environ.get("AWS_SESSION_TOKEN", "")
        if AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY:
            if AWS_SESSION_TOKEN:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                                                               aws_session_token=AWS_SESSION_TOKEN)
            else:
                SESSION: boto3.Session = boto3.session.Session(aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
        else:
            logger.error("Unable to find AWS credentials.")
            sys.exit(1)

    s3c = SESSION.client('s3', config=ctx.obj["aws_config"])

    def bucket_keys(Bucket, Prefix='', StartAfter='', Delimiter='/'):
        Prefix = Prefix[1:] if Prefix.startswith(Delimiter) else Prefix
        if not StartAfter:
            del StartAfter
            if Prefix.endswith(Delimiter):
                StartAfter = Prefix
        del Delimiter
        for page in s3c.get_paginator('list_objects_v2').paginate(Bucket=Bucket, Prefix=Prefix):
            for content in page.get('Contents', ()):
                yield content['Key']

    def worker(key):
        logger.info(f"Processing: {key}")
        s3c.copy_object(Bucket=bucket, Key=key,
                        CopySource={'Bucket': bucket, 'Key': key},
                        ACL='bucket-owner-full-control',
                        StorageClass="STANDARD"
                        )

    object_keys = bucket_keys(bucket, prefix)
    pool.map(worker, object_keys)
    end = time()
    logger.info(f"Completed for {end - start:.2f} seconds.")


if __name__ == '__main__':
    cli()

用法:

get_object_ownership.py own -v my-big-aws-logs-bucket /prefix

上面提到的存储桶使用 40 个线程处理了大约 7 个小时。

[*] INFO: 2021-08-05 19:53:55,542: Completed for 25320.45 seconds.

在同一数据子集上使用 AWS cli 与此工具进行更多速度比较:

aws s3 cp --recursive --acl bucket-owner-full-control --metadata-directive 53.59s user 7.24s system 20% cpu 5:02.42 total

[*] INFO: 2021-08-06 09:07:43,506: Completed for 49.09 seconds.

为了避免为每个对象设置 ACL 的需要,您可以做的一件事是禁用存储桶的 ACL。存储桶中的所有对象将归存储桶所有者所有,您可以使用策略代替 ACL 进行访问控制。

您可以通过将“对象所有权”设置设置为“强制执行存储桶所有者”来执行此操作。根据 the AWS documentation,这实际上是推荐的设置:

For the majority of modern use cases in S3, we recommend that you disable ACLs by choosing the bucket owner enforced setting and use your bucket policy to share data with users outside of your account as needed. This approach simplifies permissions management and auditing.

您可以在 Web 控制台中进行设置,方法是转到存储桶的“权限”选项卡,然后单击“对象所有权”部分中的“编辑”按钮。然后,您可以 select“禁用 ACL”单选按钮。

您还可以使用 AWS CLI。来自 the documentation 的示例:

aws s3api put-bucket-ownership-controls --bucket DOC-EXAMPLE-BUCKET --ownership-controls Rules=[{ObjectOwnership=BucketOwnerEnforced}]