Boto3 - 禁用自动分段上传
Boto3 - Disable automatic multipart upload
我正在使用不支持 MultipartUpload 的 S3 兼容后端。
我有一个奇怪的情况,当我上传文件时,有些服务器可以正常完成,但在其他服务器中,boto3 会自动尝试使用 MultipartUpload 上传文件。我尝试上传的文件与用于测试目的的文件完全相同,用于相同的后端、region/tenant、存储桶等...
如 documentation 所示,MultipartUpload 在需要时自动启用:
- Automatically switching to multipart transfers when a file is over a specific size threshold
以下是自动切换到 MultipartUpload 时的一些日志:
自动切换到 MultipartUpload 时的日志:
DEBUG:botocore.hooks:Event request-created.s3.CreateMultipartUpload: calling handler <function enable_upload_callbacks at 0x2b001b8>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [POST]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"POST /cassandra/samplefile.tgz?uploads HTTP/1.1" 501 None
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 09:12:48 GMT', 'transfer-encoding': 'chunked', 'content-type': 'application/xml;charset=UTF-8', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
<?xml version='1.0' encoding='UTF-8'?>
<Error>
<Code>NotImplemented</Code>
<Message>The request requires functionality that is not implemented in the current release</Message>
<RequestId>1450429968948</RequestId>
<HostId>aGRpLmJvc3RoY3AuY2xvdWQuY29ycDoyNg==</HostId>
</Error>
DEBUG:botocore.hooks:Event needs-retry.s3.CreateMultipartUpload: calling handler <botocore.retryhandler.RetryHandler object at 0x2a490d0>
未切换到多部分的日志,来自其他服务器但针对同一文件:
DEBUG:botocore.hooks:Event request-created.s3.PutObject: calling handler <function enable_upload_callbacks at 0x7f436c025500>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [PUT]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.awsrequest:Waiting for 100 Continue response.
DEBUG:botocore.awsrequest:100 Continue response seen, now sending request body.
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"PUT /cassandra/samplefile.tgz HTTP/1.1" 200 0
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 10:05:25 GMT', 'content-length': '0', 'etag': '"b407e71de028fe62fd9f2f799e606855"', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
DEBUG:botocore.hooks:Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f436be1ecd0>
DEBUG:botocore.retryhandler:No retry needed.
我上传的文件如下:
connection = boto3.client(service_name='s3',
region_name='',
api_version=None,
use_ssl=True,
verify=True,
endpoint_url=url,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=None,
config=None)
connection.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')
问题是:
- 为了避免自动切换到分片上传,如何才能
我默认禁用 MultipartUpload 还是提高阈值?
- 一台服务器使用自动多部分而其他服务器不使用同一文件是否有任何原因?
我找到了一个解决方法,使用 S3Transfer 和 Transferconfig 增加阈值大小,如下所示:
myconfig = TransferConfig(
multipart_threshold=9999999999999999, # workaround for 'disable' auto multipart upload
max_concurrency=10,
num_download_attempts=10,
)
connection = boto3.client(service_name='s3',
region_name='',
api_version=None,
use_ssl=True,
verify=True,
endpoint_url=url,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=None,
config=None)
transfer=S3Transfer(connection,myconfig)
transfer.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')
希望对大家有所帮助
当我在寻找 boto3 时,遇到了你的问题
Automatically switching to multipart transfers when a file is over a
specific size threshold??
是upload_file(来自client/resource/S3Transfer)将自动转换为分段上传,默认阈值大小为 8 MB。
如果您不想使用 MultiPart,则永远不要使用 upload_file 方法,只需使用不会使用 Multipart 的 put_object 方法。
客户=boto3.client('s3')
client.put_object(Body=open('/test.csv'),Bucket='mybucket',Key='test.csv')
我正在使用不支持 MultipartUpload 的 S3 兼容后端。
我有一个奇怪的情况,当我上传文件时,有些服务器可以正常完成,但在其他服务器中,boto3 会自动尝试使用 MultipartUpload 上传文件。我尝试上传的文件与用于测试目的的文件完全相同,用于相同的后端、region/tenant、存储桶等...
如 documentation 所示,MultipartUpload 在需要时自动启用:
- Automatically switching to multipart transfers when a file is over a specific size threshold
以下是自动切换到 MultipartUpload 时的一些日志:
自动切换到 MultipartUpload 时的日志:
DEBUG:botocore.hooks:Event request-created.s3.CreateMultipartUpload: calling handler <function enable_upload_callbacks at 0x2b001b8>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [POST]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"POST /cassandra/samplefile.tgz?uploads HTTP/1.1" 501 None
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 09:12:48 GMT', 'transfer-encoding': 'chunked', 'content-type': 'application/xml;charset=UTF-8', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
<?xml version='1.0' encoding='UTF-8'?>
<Error>
<Code>NotImplemented</Code>
<Message>The request requires functionality that is not implemented in the current release</Message>
<RequestId>1450429968948</RequestId>
<HostId>aGRpLmJvc3RoY3AuY2xvdWQuY29ycDoyNg==</HostId>
</Error>
DEBUG:botocore.hooks:Event needs-retry.s3.CreateMultipartUpload: calling handler <botocore.retryhandler.RetryHandler object at 0x2a490d0>
未切换到多部分的日志,来自其他服务器但针对同一文件:
DEBUG:botocore.hooks:Event request-created.s3.PutObject: calling handler <function enable_upload_callbacks at 0x7f436c025500>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [PUT]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.awsrequest:Waiting for 100 Continue response.
DEBUG:botocore.awsrequest:100 Continue response seen, now sending request body.
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"PUT /cassandra/samplefile.tgz HTTP/1.1" 200 0
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 10:05:25 GMT', 'content-length': '0', 'etag': '"b407e71de028fe62fd9f2f799e606855"', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
DEBUG:botocore.hooks:Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f436be1ecd0>
DEBUG:botocore.retryhandler:No retry needed.
我上传的文件如下:
connection = boto3.client(service_name='s3',
region_name='',
api_version=None,
use_ssl=True,
verify=True,
endpoint_url=url,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=None,
config=None)
connection.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')
问题是:
- 为了避免自动切换到分片上传,如何才能 我默认禁用 MultipartUpload 还是提高阈值?
- 一台服务器使用自动多部分而其他服务器不使用同一文件是否有任何原因?
我找到了一个解决方法,使用 S3Transfer 和 Transferconfig 增加阈值大小,如下所示:
myconfig = TransferConfig(
multipart_threshold=9999999999999999, # workaround for 'disable' auto multipart upload
max_concurrency=10,
num_download_attempts=10,
)
connection = boto3.client(service_name='s3',
region_name='',
api_version=None,
use_ssl=True,
verify=True,
endpoint_url=url,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=None,
config=None)
transfer=S3Transfer(connection,myconfig)
transfer.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')
希望对大家有所帮助
当我在寻找 boto3 时,遇到了你的问题
Automatically switching to multipart transfers when a file is over a specific size threshold??
是upload_file(来自client/resource/S3Transfer)将自动转换为分段上传,默认阈值大小为 8 MB。
如果您不想使用 MultiPart,则永远不要使用 upload_file 方法,只需使用不会使用 Multipart 的 put_object 方法。
客户=boto3.client('s3')
client.put_object(Body=open('/test.csv'),Bucket='mybucket',Key='test.csv')