在 Python 中使用正则表达式更改文件名时出现 NoneType 错误
Getting NoneType Error When Using Regex to Change Filenames in Python
我正在尝试使用正则表达式组来更改一堆文件名,但似乎无法让它工作(尽管 regexr.com 告诉我应该是一个有效的正则表达式语句)。我目前拥有的 93,000 个文件看起来都是这样的:
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt
我希望它们看起来像这样:
20120731McCONNELL2014sep19_at_182325.txt
并忽略任何以 Mr.、Mrs. 和 Ms. 以外的任何开头的文件
但是每次我 运行 下面的脚本,我都会得到以下错误:
Traceback (most recent call last):
File "changefilenames.py", line 11, in <module>
date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'
非常感谢您的帮助。如果这是一个愚蠢的问题,我深表歉意。我刚开始使用 RegEx 和 Python,似乎无法弄清楚这一点。
import io
import os
import re
from dateutil.parser import parse
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
if filename.startswith("Mrs."):
m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
if filename.startswith("Ms."):
m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
我已经按照 Using Regex to Change Filenames with Python 中的建议进行了调整,但还是不行。
编辑:根据以下答案进行了以下更改:
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
print filename
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
它吐出了这个:
Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
File "changefilenames.py", line 19, in <module>
os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory
进行搜索后,您总是希望在进行任何处理之前确保有匹配项。看起来您的文件可能以 'Mr.' 开头,但通常与您的表达式不匹配。
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # Only look at groups if we have a match.
date = m.group(2)
name = m.group(1)
....
我还建议不要同时使用 startswith('Mr.')
和正则表达式,因为你的正则表达式应该已经只适用于以 'Mr.' 开头的字符串,尽管你可能想添加一个 '^ ' 到正则表达式的开头以强制执行此操作:
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # ^ added carat to signify start of string.
date = m.group(2)
name = m.group(1)
...
此外,您可能想验证哪些文件不匹配,因为有那么多数据,您经常会 运行 遇到额外空格或大小写不正确等问题,因此您可能需要考虑制作你的正则表达式更健壮。
您将裸文件名传递给 os.rename
,可能缺少路径。
考虑以下布局:
yourscript.py
subdir/
- one
- two
这与您的代码相似:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(fn, fn + '_moved')
它抛出一个异常(在 Python 3 中更好一些):
FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'
因为在当前工作目录下,没有名为two
的文件。但考虑一下:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))
这有效,因为使用了完整路径。不要一次又一次地使用 'subdir'
(或在变量中),您或许应该首先更改工作目录:
import os
os.chdir('subdir')
for fn in os.listdir():
print(fn)
os.rename(fn, fn + '_moved')
我正在尝试使用正则表达式组来更改一堆文件名,但似乎无法让它工作(尽管 regexr.com 告诉我应该是一个有效的正则表达式语句)。我目前拥有的 93,000 个文件看起来都是这样的:
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt
我希望它们看起来像这样:
20120731McCONNELL2014sep19_at_182325.txt
并忽略任何以 Mr.、Mrs. 和 Ms. 以外的任何开头的文件
但是每次我 运行 下面的脚本,我都会得到以下错误:
Traceback (most recent call last):
File "changefilenames.py", line 11, in <module>
date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'
非常感谢您的帮助。如果这是一个愚蠢的问题,我深表歉意。我刚开始使用 RegEx 和 Python,似乎无法弄清楚这一点。
import io
import os
import re
from dateutil.parser import parse
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
if filename.startswith("Mrs."):
m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
if filename.startswith("Ms."):
m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
我已经按照 Using Regex to Change Filenames with Python 中的建议进行了调整,但还是不行。
编辑:根据以下答案进行了以下更改:
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
print filename
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
它吐出了这个:
Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
File "changefilenames.py", line 19, in <module>
os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory
进行搜索后,您总是希望在进行任何处理之前确保有匹配项。看起来您的文件可能以 'Mr.' 开头,但通常与您的表达式不匹配。
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # Only look at groups if we have a match.
date = m.group(2)
name = m.group(1)
....
我还建议不要同时使用 startswith('Mr.')
和正则表达式,因为你的正则表达式应该已经只适用于以 'Mr.' 开头的字符串,尽管你可能想添加一个 '^ ' 到正则表达式的开头以强制执行此操作:
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # ^ added carat to signify start of string.
date = m.group(2)
name = m.group(1)
...
此外,您可能想验证哪些文件不匹配,因为有那么多数据,您经常会 运行 遇到额外空格或大小写不正确等问题,因此您可能需要考虑制作你的正则表达式更健壮。
您将裸文件名传递给 os.rename
,可能缺少路径。
考虑以下布局:
yourscript.py
subdir/
- one
- two
这与您的代码相似:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(fn, fn + '_moved')
它抛出一个异常(在 Python 3 中更好一些):
FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'
因为在当前工作目录下,没有名为two
的文件。但考虑一下:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))
这有效,因为使用了完整路径。不要一次又一次地使用 'subdir'
(或在变量中),您或许应该首先更改工作目录:
import os
os.chdir('subdir')
for fn in os.listdir():
print(fn)
os.rename(fn, fn + '_moved')