在Python中，如何获取部分docx文档？

Question

我想通过 Python 获取部分 docx 文档（例如，所有内容的 10%） 3. 我该怎么做？谢谢

Answer 1

我会尝试这行：

from math import floor

def docx(file, percent):
  text = []
  lines = sum(1 for line in open(file))
  #print("File has {0} lines".format(lines))
  no = floor((lines * percent / 100))
  #print('Rounded to ', no)
  limit = 0
  with open(file) as f:
    for l in f:
      text.append(l)
      limit += 1
      if limit == no:
        break
  return text

要对其进行测试，请尝试：

print(docx('example.docx', 10))

Answer 2

与 python 中的 .docx 文件交互的一个好方法是 docx2txt module.

如果你安装了 pip，你可以打开你的终端运行:

pip install docx2txt

一旦你有了 docx 模块，你就可以运行:

import docx2txt

然后您可以 return 文档中的文本并仅过滤您想要的部分。 filename.docx 的内容作为字符串存储在变量 text.

中

text = docx2txt.process("filename.docx")
print(text)

现在可以使用一些基本的 built-functions 来操作该字符串。下面的代码片段使用 len() 函数打印 text 的结果，returns 的长度，并将字符串切片为通过创建一个子字符串大约 10%。

len(text)
print(len(text))  # returns 1000 for my sample document

text = text[1:100]
print(text)  # returns 10% of the string

此示例的完整代码如下。我希望这是有帮助的！

import docx2txt

text = docx2txt.process("/home/jared/test.docx")
print(text)

len(text)
print(len(text))  # returns 1000 for my sample document

text = text[1:100]
print(text)  # returns 10% of the string

在Python中，如何获取部分docx文档？

In Python, how can I get part of docx document?

docx

python-3.x

与 python 中的 .docx 文件交互的一个好方法是 docx2txt module.