Python 即时从应用程序获取文件（不将其保存在文件系统中）

Question

我想让用户向我的应用程序提交一个 MS Word 文件，用 python-docx 库处理它，然后 return 返回。由于文件可能很大，我不想在处理后将其保存到文件系统中，而是 return 下载。

从流中获取文件 - 这有效

import docx
from docx.document import Document 
from StringIO import StringIO

source_stream = StringIO(request.vars['file'].value)
document = docx.Document(source_stream)
source_stream.close()
process_doc(document)

Return 它作为流 - 这不起作用

应用程序确实让用户下载文件，但是*MS Word 无法打开文件，提示"because some part is missing or invalid" .

def download(document, filename):
    import contenttype as c
    import cStringIO
    out_stream = cStringIO.StringIO()
    document.save(out_stream)  

    response.headers['Content-Type'] = c.contenttype(filename)
    response.headers['Content-Disposition'] = \
            "attachment; filename=%s" %  filename
    return out_stream.getvalue()

我找到了 Upload a StringIO object with send_file() 但这仍然存在于烧瓶框架中。我宁愿使用 web2py 框架。

更新 1

有人说在将文件指针发送到输出流之前将文件指针移动到文档数据的开头。但是要怎么做呢？

更新 2

正如@scanny 所建议的，我创建了一个空文件，

document = docx.Document()

并使用 BytesIO 模块从文件对象下载：

document = docx.Document() 
from io import BytesIO
out_stream = BytesIO()
document.save(out_stream)
filename = 'temporal_file.docx'
filepath = os.path.join(request.folder, 'uploads',filename )
try:
    with open(filepath, 'wb') as f:
        f.write(out_stream.getvalue())
    response.flash ='Success to open file for writing'
    response.headers['Content-Disposition'] = "attachment; filename=%s" % filename
    response.headers['Content-Type'] = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
    #response['X-Sendfile'] = filepath
    #response['Content-Length'] = os.stat(filepath).st_size
    return  out_stream.getvalue()

如代码所示，我还将那个空文件写入了文件系统。而且我可以轻松地手动下载它并在 MS word 中打开它：

所以，问题仍然悬而未决为什么下载的 MS Word 文件（通过输出流）损坏并且无法用 MS Word 打开？

更新 3

我已经从文件输出到输出流的过程中消除了 python-docx。结果是一样的：文件下载过程完成后，无法在 MS Word 中打开它。代码：

# we load without python-docx library
from io import BytesIO
try:
    filename = 'empty_file.docx'
    filepath = os.path.join(request.folder, 'uploads',filename )
    # read a file from file system (disk)
    with open(filepath, 'rb') as f: 
        out_stream = BytesIO(f.read())
    response.flash ='Success to open file for reading'
    response.headers['Content-Disposition'] = "attachment; filename=%s" % filename
    response.headers['Content-Type'] = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
    return out_stream.getvalue()
except Exception as e:
    response.flash ='Error open file for reading or download it - ' + filename
return

Answer 1

我会先保存到 file-like object，然后将 file-like object 复制到一个文件（本地，无需下载）。这应该平分问题发生的范围。顺便说一下，我会使用 BytesIO 而不是 StringIO。它在 2.7 中可能没有什么不同，但它可以，并且 StringIO 在 Python 3 中无论如何都不起作用：

from io import BytesIO

# ... code that processes `document`
out_stream = BytesIO()
document.save(out_stream)
with open('test.docx', 'wb') as f:
    f.write(out_stream.getvalue())

如果这不起作用（test.docx 无法打开），您已将问题缩小到 "before" document.save() 调用。

如果确实有效，您可以再次尝试下载并查看，但要特别注意下载方法中预期的 return 值类型。你在这里得到的是一个字节序列。如果它期望 file-like object 或者可能是一条路径，那也可能是问题所在。

将文件指针移动到开头（使用 out_stream.seek(0)）只有在返回 file-like object 时才有意义，例如 return out_stream return outstream.getvalue() 个。后者 returns bytes，当然没有文件指针。 BytesIO（或 StringIO）.getvalue() 不需要设置文件游标；它总是 returns object.

的全部内容

此外，我不会依赖 contenttype 来正确处理，而是将 content-type header 拼写为：application/vnd.openxmlformats-officedocument.wordprocessingml.document。如果 contenttype 将文件错误识别为 .doc 格式 (pre-Word 2007) 文件而不是 .docx 格式（Word 2007 及更高版本）文件，这也可能导致问题。

Python 即时从应用程序获取文件（不将其保存在文件系统中）

Python get file from an app on-the-fly (without saving it in file system)

python

on-the-fly

output

python-docx

更新 1

更新 2

更新 3