使用烧瓶从 s3 存储桶读取 docx 文件会导致 AttributeError
Reading a docx file from s3 bucket with flask results in an AttributeError
我遇到了很多不同的错误,我什至不知道应该提到哪个,但这与凭据无关,因为我已经可以上传文件并且可以读取 txt 文件。现在我想看一个docx。
我在我的 index.html 中创建了一个表单,其中只有一个文本区域用于写入文件的确切名称和一个提交输入,该输入将打开一个新的 window,其中包含我的段落数我的 AWS S3 存储桶中的 docx 文件。
我得到的错误是:
AttributeError: 'StreamingBody' object has no attribute 'seek'
我的代码如下所示:
path = "s3://***bucket/"
bucket_name = "***bucket"
@app.route('/resultfiles', methods=["POST"])
def getdata():
thefilename = request.form['file_name']
if '.docx' in thefilename:
object_key = thefilename
file_object = client.get_object(Bucket=bucket_name, Key=object_key)
body = file_object['Body']
doc = docx.Document(body)
docx_paras = len(doc.paragraphs)
return render_template('resultfiles.html', docx_paras=docx_paras)
我查看了 python-docx 的文档,特别是 Document-constructor:
docx.Document(docx=None)
Return a Document object loaded from docx, where docx can be either a path to a .docx file (a string) or a file-like object. If docx is missing or None, the built-in default document “template” is loaded.
似乎需要类文件对象或文件路径。我们可以将从 boto3 获得的不同表示形式转换为类似文件的对象,这里是一些示例代码:
import io
import boto3
import docx
BUCKET_NAME = "my-bucket"
def main():
s3 = boto3.resource("s3")
bucket = s3.Bucket(BUCKET_NAME)
object_in_s3 = bucket.Object("test.docx")
object_as_streaming_body = object_in_s3.get()["Body"]
print(f"Type of object_as_streaming_body: {type(object_as_streaming_body)}")
object_as_bytes = object_as_streaming_body.read()
print(f"Type of object_as_bytes: {type(object_as_bytes)}")
# Now we use BytesIO to create a file-like object from our byte-stream
object_as_file_like = io.BytesIO(object_as_bytes)
# Et voila!
document = docx.Document(docx=object_as_file_like)
print(document.paragraphs)
if __name__ == "__main__":
main()
这是它的样子:
$ python test.py
Type of object_as_streaming_body: <class 'botocore.response.StreamingBody'>
Type of object_as_bytes: <class 'bytes'>
[<docx.text.paragraph.Paragraph object at 0x00000258B7C34A30>]
我遇到了很多不同的错误,我什至不知道应该提到哪个,但这与凭据无关,因为我已经可以上传文件并且可以读取 txt 文件。现在我想看一个docx。
我在我的 index.html 中创建了一个表单,其中只有一个文本区域用于写入文件的确切名称和一个提交输入,该输入将打开一个新的 window,其中包含我的段落数我的 AWS S3 存储桶中的 docx 文件。
我得到的错误是:
AttributeError: 'StreamingBody' object has no attribute 'seek'
我的代码如下所示:
path = "s3://***bucket/"
bucket_name = "***bucket"
@app.route('/resultfiles', methods=["POST"])
def getdata():
thefilename = request.form['file_name']
if '.docx' in thefilename:
object_key = thefilename
file_object = client.get_object(Bucket=bucket_name, Key=object_key)
body = file_object['Body']
doc = docx.Document(body)
docx_paras = len(doc.paragraphs)
return render_template('resultfiles.html', docx_paras=docx_paras)
我查看了 python-docx 的文档,特别是 Document-constructor:
docx.Document(docx=None)
Return a Document object loaded from docx, where docx can be either a path to a .docx file (a string) or a file-like object. If docx is missing or None, the built-in default document “template” is loaded.
似乎需要类文件对象或文件路径。我们可以将从 boto3 获得的不同表示形式转换为类似文件的对象,这里是一些示例代码:
import io
import boto3
import docx
BUCKET_NAME = "my-bucket"
def main():
s3 = boto3.resource("s3")
bucket = s3.Bucket(BUCKET_NAME)
object_in_s3 = bucket.Object("test.docx")
object_as_streaming_body = object_in_s3.get()["Body"]
print(f"Type of object_as_streaming_body: {type(object_as_streaming_body)}")
object_as_bytes = object_as_streaming_body.read()
print(f"Type of object_as_bytes: {type(object_as_bytes)}")
# Now we use BytesIO to create a file-like object from our byte-stream
object_as_file_like = io.BytesIO(object_as_bytes)
# Et voila!
document = docx.Document(docx=object_as_file_like)
print(document.paragraphs)
if __name__ == "__main__":
main()
这是它的样子:
$ python test.py
Type of object_as_streaming_body: <class 'botocore.response.StreamingBody'>
Type of object_as_bytes: <class 'bytes'>
[<docx.text.paragraph.Paragraph object at 0x00000258B7C34A30>]