将 s3curl.pl 移植到 Python

Question

我已经运行一个cmd命令：

~/s3curl/s3curl.pl --id mapreduce -- -sf https://$SERVER/$PATH >> $TEMP_FILE

我想将我的脚本移植到 Python。

我试过了：

import boto3
client = boto3.client('s3')
response = client.get_object(Bucket=<server>, Key=<path>)

但是我收到一个错误：

botocore.exceptions.ClientError: An error occurred (AllAccessDisabled) when calling the GetObject operation: All access to this object has been disabled

我做错了什么？

谢谢！

Answer 1

原来有一个名为 .s3curl 的文件位于与 s3curl.pl 相同的目录中，其中包含用户 ID 和加密密钥。

我将其翻译成名为 s3.yaml 的 yaml 文件，其中包含：

awsSecretAccessKeys:
  mapreduce:
    id: <insert id here>
    key: <insert key here>

Pythonic 解决方案是：

def download_file_from_s3(s3_server, path, export_path):
    url = s3_server + path
    with open('s3.yaml') as f:
        s3_conf = yaml.load(f.read())['awsSecretAccessKeys']['mapreduce']

    now = datetime.now().strftime('%a, %d %b %Y %H:%M:%S +0000')
    to_sign = 'GET\n\n\n{}\n{}'.format(now, path)
    signature = hmac.new(s3_conf['key'], to_sign, sha1).digest().encode("base64").rstrip('\n')
    response = requests.get(url, headers={'Date': now, 'Authorization': 'AWS {}:{}'.format(s3_conf['id'], signature)})

    response.raise_for_status()

    with open(export_path, 'ab') as f:
        for block in response.iter_content(4096):
            f.write(block)

将 s3curl.pl 移植到 Python

PORTING usage of s3curl.pl to Python

python

amazon-s3

boto3

amazon-emr