使用 Flask 和 S3 处理大文件的首选方式

Prefered way of using Flask and S3 for large files

我知道这有点开放性,但我很困惑 strategy/method 申请使用 Flask 和 boto3 开发的大文件上传服务.对于较小的文件,一切都很好。但是当大小超过 100 MB 时看到你们的想法真的很高兴

我的想法如下 -

a) Stream the file to Flask app using some kind of AJAX uploader(What I am trying to build is just a REST interface using Flask-Restful. Any example of using these components, e.g. Flask-Restful, boto3 and streaming large files are welcome.). The upload app is going to be (I believe) part of a microservices platform that we are building. I do not know whether there will be a Nginx proxy in front of the flask app or it will be directly served from a Kubernetes pod/service. In case it is directly served, is there something that I have to change for large file upload either in kubernetes and/or Flask layer?

b) Using a direct JS uploader (like http://www.plupload.com/) and stream the file into s3 bucket directly and when finished get the URL and pass it to the Flask API app and store it in DB. The problem with this is, the credentials need to be there somewhere in JS which means a security threat. (Not sure if any other concerns are there)

您认为其中哪一个(或者我根本没有考虑过的不同的东西)是最好的方法,我在哪里可以找到一些代码示例?

提前致谢。

[编辑]

我发现了这个 - http://blog.pelicandd.com/article/80/streaming-input-and-output-in-flask 作者正在处理与我类似的情况,他提出了一个解决方案。但是他正在打开一个已经存在于磁盘中的文件。如果我想直接上传作为 s3 存储桶中的单个对象出现的文件怎么办?我觉得这可以作为解决方案的基础,而不是解决方案本身。

  1. Flask只能使用内存来保存所有的http请求体,所以没有像我所知的磁盘缓冲等功能。
  2. Nginx 上传模块是上传大文件的好方法。该文档是 here
  3. 也可以使用html5,flash发送trunked文件数据,在Flask中处理数据,但是比较复杂。
  4. 尝试查找 s3 是否提供一次性令牌。

使用上面发布的 link 我终于完成了以下操作。如果您认为这是一个好的解决方案,请告诉我

import boto3
from flask import Flask, request

.
.
.

@app.route('/upload', methods=['POST'])
def upload():
    s3 = boto3.resource('s3', aws_access_key_id="key", aws_secret_access_key='secret', region_name='us-east-1')
    s3.Object('bucket-name','filename').put(Body=request.stream.read(CHUNK_SIZE))
.
.
.

或者您可以使用 Minio-py 客户端库,它是开源的并且与 S3 API 兼容。它本机为您处理分段上传。

一个简单的put_object.py例子:

import os

from minio import Minio
from minio.error import ResponseError

client = Minio('s3.amazonaws.com',
               access_key='YOUR-ACCESSKEYID',
               secret_key='YOUR-SECRETACCESSKEY')

# Put a file with default content-type.
try:
    file_stat = os.stat('my-testfile')
    file_data = open('my-testfile', 'rb')
    client.put_object('my-bucketname', 'my-objectname', file_data, file_stat.st_size)
except ResponseError as err:
    print(err)

# Put a file with 'application/csv'
try:
    file_stat = os.stat('my-testfile.csv')
    file_data = open('my-testfile.csv', 'rb')
    client.put_object('my-bucketname', 'my-objectname', file_data,
                      file_stat.st_size, content_type='application/csv')
except ResponseError as err:
    print(err)

您可以找到包含示例的完整 API 操作列表 here

正在安装 Minio-Py 库

$ pip install minio

希望对您有所帮助。

免责声明:我为 Minio

工作