根据文档中的文本使用 python 重命名 .doc 或 .docx
Renaming .doc or .docx with python according to a text from the document
关于根据文档中的特定文本更改 .doc 或 .docx 文件名,我遇到了问题。
我已经能够使用 .txt 文件建立此功能。使用以下代码:
import os
import re
pat = "ID number(\d\d\d\d\d)" #This is for the text to be found in the file
ext = '.txt' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
有人对此有任何看法吗?
你需要python-docx
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
for para in document.paragraphs:
s = re.search(pat, para.text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath+'docx')
找到答案了。问题在我这边。我试图找到一个值。但我需要的是指定一个单元格。由于该值位于 table 中。
结果如下:
import os
import re
import sys
pat = "(\d+)" #Type of string/value that is being renamed
ext = '.docx' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
table = document.tables[0]
s = re.search(pat,table.cell(1,2).text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
print (newpath + ext)
input("Press Enter to exit")
需要注意的是,此方法仅适用于 word 2007 及更高版本可用的 .docx 文件。由于 python-docx 不适用于早期版本或 .doc 文件
所以我的下一个项目是实现一个从 .doc 到 .docx 的转换器
感谢大家的参与
关于根据文档中的特定文本更改 .doc 或 .docx 文件名,我遇到了问题。
我已经能够使用 .txt 文件建立此功能。使用以下代码:
import os
import re
pat = "ID number(\d\d\d\d\d)" #This is for the text to be found in the file
ext = '.txt' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
有人对此有任何看法吗?
你需要python-docx
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
for para in document.paragraphs:
s = re.search(pat, para.text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath+'docx')
找到答案了。问题在我这边。我试图找到一个值。但我需要的是指定一个单元格。由于该值位于 table 中。
结果如下:
import os
import re
import sys
pat = "(\d+)" #Type of string/value that is being renamed
ext = '.docx' #Type of file the python is searching for
mydir = '' #Path or directory where python is doing its magic
from docx import Document
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
document = Document(archpath)
table = document.tables[0]
s = re.search(pat,table.cell(1,2).text)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
print (newpath + ext)
input("Press Enter to exit")
需要注意的是,此方法仅适用于 word 2007 及更高版本可用的 .docx 文件。由于 python-docx 不适用于早期版本或 .doc 文件
所以我的下一个项目是实现一个从 .doc 到 .docx 的转换器
感谢大家的参与