根据文档中的文本使用 python 重命名 .doc 或 .docx

Renaming .doc or .docx with python according to a text from the document

关于根据文档中的特定文本更改 .doc 或 .docx 文件名,我遇到了问题。

我已经能够使用 .txt 文件建立此功能。使用以下代码:

import os
import re
pat = "ID number(\d\d\d\d\d)"         #This is for the text to be found in the file
ext = '.txt'                                #Type of file the python is searching for
mydir = ''  #Path or directory where python is doing its magic

for arch in os.listdir(mydir):              
    archpath = os.path.join(mydir, arch)
    with open(archpath) as f:
        txt = f.read()
    s = re.search(pat, txt)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir, name)
    if not os.path.exists(newpath):
        os.rename(archpath, newpath + ext)

有人对此有任何看法吗?

你需要python-docx

from docx import Document
for arch in os.listdir(mydir):              
    archpath = os.path.join(mydir, arch)
    document = Document(archpath)
    for para in document.paragraphs:
        s = re.search(pat, para.text)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir, name)
    if not os.path.exists(newpath):
        os.rename(archpath, newpath+'docx')

找到答案了。问题在我这边。我试图找到一个值。但我需要的是指定一个单元格。由于该值位于 table 中。

结果如下:

import os
import re
import sys
pat = "(\d+)"       #Type of string/value that is being renamed
ext = '.docx'       #Type of file the python is searching for
mydir = ''  #Path or directory where python is doing its magic

from docx import Document
for arch in os.listdir(mydir):
    archpath = os.path.join(mydir, arch)
    document = Document(archpath)
    table = document.tables[0]
    s = re.search(pat,table.cell(1,2).text)
    if s is None:
        continue
    name = s.group(1)
    newpath = os.path.join(mydir, name)
    if not os.path.exists(newpath):
        os.rename(archpath, newpath + ext)
print (newpath + ext)
input("Press Enter to exit")

需要注意的是,此方法仅适用于 word 2007 及更高版本可用的 .docx 文件。由于 python-docx 不适用于早期版本或 .doc 文件

所以我的下一个项目是实现一个从 .doc 到 .docx 的转换器

感谢大家的参与