使用 python 读取 ms word

Question

我正在尝试使用 StringIO 读取 ms word。但不知何故输出变得奇怪 string

from docx import Document
import StringIO
import cStringIO

files = "D:/Workspace/Python scripting/test.docx"


document = Document(files)

f = cStringIO.StringIO()
document.save(f)
contents = f.getvalue()
print contents

提前感谢您的帮助

Answer 1

document.save(f) 将文件保存为字符串，格式为 .docx 文件。然后您将读取该字符串，它与 f=open(files).read() 的作用完全相同。如果您想要 document 中的文本，您应该为此使用 python-docx 的 API。我以前没用过，但是文档在这里：

https://python-docx.readthedocs.org/en/latest/index.html

看起来你可以使用这样的东西：

paragraphs=document.paragraphs

这是文档中 Paragraph 个对象的列表。您可以像这样获取该段落的 tex：

text="\n".join([paragraph.text for paragraph in paragraphs])

text 将包含文档的文本。

使用 python 读取 ms word

read ms word with python

python

ms-word