Boto3 - 禁用自动分段上传

Boto3 - Disable automatic multipart upload

我正在使用不支持 MultipartUpload 的 S3 兼容后端。

我有一个奇怪的情况,当我上传文件时,有些服务器可以正常完成,但在其他服务器中,boto3 会自动尝试使用 MultipartUpload 上传文件。我尝试上传的文件与用于测试目的的文件完全相同,用于相同的后端、region/tenant、存储桶等...

documentation 所示,MultipartUpload 在需要时自动启用:

  • Automatically switching to multipart transfers when a file is over a specific size threshold

以下是自动切换到 MultipartUpload 时的一些日志:

自动切换到 MultipartUpload 时的日志:

DEBUG:botocore.hooks:Event request-created.s3.CreateMultipartUpload: calling handler <function enable_upload_callbacks at 0x2b001b8>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [POST]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"POST /cassandra/samplefile.tgz?uploads HTTP/1.1" 501 None
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 09:12:48 GMT', 'transfer-encoding': 'chunked', 'content-type': 'application/xml;charset=UTF-8', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
<?xml version='1.0' encoding='UTF-8'?>
<Error>
  <Code>NotImplemented</Code>
  <Message>The request requires functionality that is not implemented in the current release</Message>
  <RequestId>1450429968948</RequestId>
  <HostId>aGRpLmJvc3RoY3AuY2xvdWQuY29ycDoyNg==</HostId>
</Error>     
DEBUG:botocore.hooks:Event needs-retry.s3.CreateMultipartUpload: calling handler <botocore.retryhandler.RetryHandler object at 0x2a490d0>

未切换到多部分的日志,来自其他服务器但针对同一文件:

DEBUG:botocore.hooks:Event request-created.s3.PutObject: calling handler <function enable_upload_callbacks at 0x7f436c025500>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [PUT]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.awsrequest:Waiting for 100 Continue response.
DEBUG:botocore.awsrequest:100 Continue response seen, now sending request body.
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"PUT /cassandra/samplefile.tgz HTTP/1.1" 200 0
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 10:05:25 GMT', 'content-length': '0', 'etag': '"b407e71de028fe62fd9f2f799e606855"', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:

DEBUG:botocore.hooks:Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f436be1ecd0>
DEBUG:botocore.retryhandler:No retry needed.

我上传的文件如下:

connection = boto3.client(service_name='s3',
        region_name='',
        api_version=None,
        use_ssl=True,
        verify=True,
        endpoint_url=url,
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        aws_session_token=None,
        config=None)
connection.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')

问题是:

我找到了一个解决方法,使用 S3Transfer 和 Transferconfig 增加阈值大小,如下所示:

myconfig = TransferConfig(

    multipart_threshold=9999999999999999, # workaround for 'disable' auto multipart upload
    max_concurrency=10,
    num_download_attempts=10,
)

connection = boto3.client(service_name='s3',
        region_name='',
        api_version=None,
        use_ssl=True,
        verify=True,
        endpoint_url=url,
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        aws_session_token=None,
        config=None)
transfer=S3Transfer(connection,myconfig)

transfer.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')

希望对大家有所帮助

当我在寻找 boto3 时,遇到了你的问题

Automatically switching to multipart transfers when a file is over a specific size threshold??

是upload_file(来自client/resource/S3Transfer)将自动转换为分段上传,默认阈值大小为 8 MB。

如果您不想使用 MultiPart,则永远不要使用 upload_file 方法,只需使用不会使用 Multipart 的 put_object 方法。

客户=boto3.client('s3')

client.put_object(Body=open('/test.csv'),Bucket='mybucket',Key='test.csv')