在 Python 中使用正则表达式更改文件名时出现 NoneType 错误

Getting NoneType Error When Using Regex to Change Filenames in Python

我正在尝试使用正则表达式组来更改一堆文件名,但似乎无法让它工作(尽管 regexr.com 告诉我应该是一个有效的正则表达式语句)。我目前拥有的 93,000 个文件看起来都是这样的:

Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt    
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt

我希望它们看起来像这样:

20120731McCONNELL2014sep19_at_182325.txt

并忽略任何以 Mr.、Mrs. 和 Ms. 以外的任何开头的文件

但是每次我 运行 下面的脚本,我都会得到以下错误:

Traceback (most recent call last):
  File "changefilenames.py", line 11, in <module>
    date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

非常感谢您的帮助。如果这是一个愚蠢的问题,我深表歉意。我刚开始使用 RegEx 和 Python,似乎无法弄清楚这一点。

import io
import os
import re
from dateutil.parser import parse


for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
    if filename.startswith("Mr."):

        m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mr"

    if filename.startswith("Mrs."):

        m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

    if filename.startswith("Ms."):

        m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

我已经按照 Using Regex to Change Filenames with Python 中的建议进行了调整,但还是不行。

编辑:根据以下答案进行了以下更改:

for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
    if filename.startswith("Mr."):
        print filename
        m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        if m:
            date = m.group(2)
            name = m.group(1)
            timestamp = m.group(3)

            dt = parse(date)
            new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

            os.rename(filename, new_filename)
            print new_filename

print "All done with the Mr"

它吐出了这个:

Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
  File "changefilenames.py", line 19, in <module>
    os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory

进行搜索后,您总是希望在进行任何处理之前确保有匹配项。看起来您的文件可能以 'Mr.' 开头,但通常与您的表达式不匹配。

if filename.startswith("Mr."):

    m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
    if m: # Only look at groups if we have a match.
        date = m.group(2)
        name = m.group(1)
        ....

我还建议不要同时使用 startswith('Mr.') 和正则表达式,因为你的正则表达式应该已经只适用于以 'Mr.' 开头的字符串,尽管你可能想添加一个 '^ ' 到正则表达式的开头以强制执行此操作:

m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:        # ^ added carat to signify start of string.
    date = m.group(2)
    name = m.group(1)
    ...

此外,您可能想验证哪些文件不匹配,因为有那么多数据,您经常会 运行 遇到额外空格或大小写不正确等问题,因此您可能需要考虑制作你的正则表达式更健壮。

您将裸文件名传递给 os.rename,可能缺少路径。

考虑以下布局:

yourscript.py
subdir/
  - one
  - two

这与您的代码相似:

import os

for fn in os.listdir('subdir'):
    print(fn)
    os.rename(fn, fn + '_moved')

它抛出一个异常(在 Python 3 中更好一些):

FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'

因为在当前工作目录下,没有名为two的文件。但考虑一下:

import os

for fn in os.listdir('subdir'):
    print(fn)
    os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))

这有效,因为使用了完整路径。不要一次又一次地使用 'subdir'(或在变量中),您或许应该首先更改工作目录:

import os

os.chdir('subdir')

for fn in os.listdir():
    print(fn)
    os.rename(fn, fn + '_moved')