遍历目录树并将日期戳附加到文件名
Walking directory tree and appending datestamps to file names
我有一个大约 900 字的目录,excel,PDF 文件,我的最终目标是我只想扫描目录中的 PDF 文档,将它们移动到一个文件中,为它们添加日期戳,然后然后在其中搜索特定的公司名称,返回文件 name/date stamp where text was found.
我编写此代码的第一步是首先通过剥离我不需要的内容来组织我的文件 need/copying PDF 文件,同时重命名每个 PDF 文件以包含创建日期每个文件名。但是,我正在努力使这些基本知识发挥作用。到目前为止,这是我的代码,在几个文件的测试目录上 - 到目前为止,我已经将它设置为打印每个文件夹、子文件夹和文件名,以检查遍历是否有效,并且有效:
import os
import datetime
os.chdir(r'H:\PyTest')
def modification_date(filename):
t = os.path.getctime(filename)
return datetime.datetime.fromtimestamp(t).year, datetime.datetime.fromtimestamp(t).month
#Test function works
modification_date(r'H:\PyTest10\Oct\Meeting Minutes.docx')
#output: (2020, 10)
#for loop walks through the main folder, each subfolder and each file and prints the name of each pdf file found
for folderName, subfolders, filenames in os.walk('H:\PyTest'):
print ('the current folder is ' + folderName)
for subfolder in subfolders:
print('SUBFOLDER OF ' + folderName + ':' + subfolder)
for filename in filenames:
if filename.endswith('pdf'):
print(filename)
#print(modification_date(filename))
如果没有我注释掉的末尾部分,print(modification_date(filename)
,这似乎可以打印出任何 pdf 的目录和名称。
the current folder is H:\PyTest
SUBFOLDER OF H:\PyTest:2010
SUBFOLDER OF H:\PyTest:2011
SUBFOLDER OF H:\PyTest:2012
the current folder is H:\PyTest10
SUBFOLDER OF H:\PyTest10:Dec
SUBFOLDER OF H:\PyTest10:Oct
the current folder is H:\PyTest10\Dec
HF Cheat Sheet.pdf
the current folder is H:\PyTest10\Oct
the current folder is H:\PyTest11
SUBFOLDER OF H:\PyTest11:Dec
SUBFOLDER OF H:\PyTest11:Oct
the current folder is H:\PyTest11\Dec
HF Cheat Sheet.pdf
the current folder is H:\PyTest11\Oct
the current folder is H:\PyTest12
SUBFOLDER OF H:\PyTest12:Dec
SUBFOLDER OF H:\PyTest12:Oct
the current folder is H:\PyTest12\Dec
HF Cheat Sheet.pdf
the current folder is H:\PyTest12\Oct
然而,在我的代码中包含 print(modification_date(filename) 时,我收到 FileNotFound 错误。因此该函数似乎不知道目录路径,这就是它失败的原因。
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'HF Cheat Sheet.pdf'
任何人都可以建议编辑如何获取日期戳,然后更改每个 pdf 名称以将其包含在开头或结尾吗?我正在查找上次保存文件的日期。
非常感谢
您必须使用 var folderName
构建文件的完整路径。它将是这样的:
for folderName, subfolders, filenames in os.walk('H:\PyTest'):
print ('the current folder is ' + folderName)
for subfolder in subfolders:
print('SUBFOLDER OF ' + folderName + ':' + subfolder)
for filename in filenames:
if filename.endswith('pdf'):
print(filename)
print(modification_date(os.path.join(folderName,filename)))
在folderName
中(通常这个var被称为root
)存储的是路径from:你在[=中输入的路径14=] to:迭代中的当前文件夹。要获取文件的完整路径,您必须将其与文件名连接起来。
我有一个大约 900 字的目录,excel,PDF 文件,我的最终目标是我只想扫描目录中的 PDF 文档,将它们移动到一个文件中,为它们添加日期戳,然后然后在其中搜索特定的公司名称,返回文件 name/date stamp where text was found.
我编写此代码的第一步是首先通过剥离我不需要的内容来组织我的文件 need/copying PDF 文件,同时重命名每个 PDF 文件以包含创建日期每个文件名。但是,我正在努力使这些基本知识发挥作用。到目前为止,这是我的代码,在几个文件的测试目录上 - 到目前为止,我已经将它设置为打印每个文件夹、子文件夹和文件名,以检查遍历是否有效,并且有效:
import os
import datetime
os.chdir(r'H:\PyTest')
def modification_date(filename):
t = os.path.getctime(filename)
return datetime.datetime.fromtimestamp(t).year, datetime.datetime.fromtimestamp(t).month
#Test function works
modification_date(r'H:\PyTest10\Oct\Meeting Minutes.docx')
#output: (2020, 10)
#for loop walks through the main folder, each subfolder and each file and prints the name of each pdf file found
for folderName, subfolders, filenames in os.walk('H:\PyTest'):
print ('the current folder is ' + folderName)
for subfolder in subfolders:
print('SUBFOLDER OF ' + folderName + ':' + subfolder)
for filename in filenames:
if filename.endswith('pdf'):
print(filename)
#print(modification_date(filename))
如果没有我注释掉的末尾部分,print(modification_date(filename)
,这似乎可以打印出任何 pdf 的目录和名称。
the current folder is H:\PyTest
SUBFOLDER OF H:\PyTest:2010
SUBFOLDER OF H:\PyTest:2011
SUBFOLDER OF H:\PyTest:2012
the current folder is H:\PyTest10
SUBFOLDER OF H:\PyTest10:Dec
SUBFOLDER OF H:\PyTest10:Oct
the current folder is H:\PyTest10\Dec
HF Cheat Sheet.pdf
the current folder is H:\PyTest10\Oct
the current folder is H:\PyTest11
SUBFOLDER OF H:\PyTest11:Dec
SUBFOLDER OF H:\PyTest11:Oct
the current folder is H:\PyTest11\Dec
HF Cheat Sheet.pdf
the current folder is H:\PyTest11\Oct
the current folder is H:\PyTest12
SUBFOLDER OF H:\PyTest12:Dec
SUBFOLDER OF H:\PyTest12:Oct
the current folder is H:\PyTest12\Dec
HF Cheat Sheet.pdf
the current folder is H:\PyTest12\Oct
然而,在我的代码中包含 print(modification_date(filename) 时,我收到 FileNotFound 错误。因此该函数似乎不知道目录路径,这就是它失败的原因。
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'HF Cheat Sheet.pdf'
任何人都可以建议编辑如何获取日期戳,然后更改每个 pdf 名称以将其包含在开头或结尾吗?我正在查找上次保存文件的日期。
非常感谢
您必须使用 var folderName
构建文件的完整路径。它将是这样的:
for folderName, subfolders, filenames in os.walk('H:\PyTest'):
print ('the current folder is ' + folderName)
for subfolder in subfolders:
print('SUBFOLDER OF ' + folderName + ':' + subfolder)
for filename in filenames:
if filename.endswith('pdf'):
print(filename)
print(modification_date(os.path.join(folderName,filename)))
在folderName
中(通常这个var被称为root
)存储的是路径from:你在[=中输入的路径14=] to:迭代中的当前文件夹。要获取文件的完整路径,您必须将其与文件名连接起来。