在 os.walk()ing 时重命名文件夹和文件在更改目录名称后丢失了一些文件

Renaming folders and files while os.walk()ing them missed some files after change of the directory name

我有这样的文件夹结构:

Template
  - Template1
  - Template2
TemplateTest
  - TemplateTest1
Config
  - TemplateConfig

我想将每个文件名和文件夹名的 'Template' 替换为 'MyApp'。

这是我的代码:

for root, dirs, files in os.walk(path):
    for name in files:
        if name.startswith("Template"):
            replace = name.replace("Template",'MyApp')
            os.rename(os.path.join(root,name),os.path.join(root,name.replace(old,new)))
    for name in dirs:
        if name.startswith("Template"):
            replace = name.replace("Template",'MyApp')
            os.rename(os.path.join(root,name),os.path.join(root,replace))

奇怪的是,这只替换了文件夹名称和父文件夹名称不需要更改的文件名。像这样:

MyApp
  - Template1
  - Template2
MyAppTest
  - TemplateTest1
Config
  - MyAppConfig

但如果我执行此代码两次,它将替换文件。 我想知道为什么以及如何更改代码以替换我需要的一切?

如有疑问 - print 它:

创建数据结构:

import os


for d in ["./Template","./TemplateTest","./Config"]:
    os.mkdir(d)

for f in ["./Template/Template1.txt","./Template/Template2.txt",
          "./TemplateTest/TemplateTest1.txt", "./Config/TemplateConfig.txt"]:
    with open(f,"w") as f:
        f.write(" ")

测试os.walk

for root, dirs, files in os.walk("./"): # no topdown means == True
    for name in files:
        if name.startswith("Template"):
            replace = name.replace("Template",'MyApp')
            print("renaming: ", os.path.join(root,name), " to ", os.path.join(root,replace))
            # os.rename(os.path.join(root,name),os.path.join(root,replace))
    for name in dirs:
        if name.startswith("Template"):
            replace = name.replace("Template",'MyApp')
            print("renaming: ", os.path.join(root,name), " to ", os.path.join(root,replace))
            # os.rename(os.path.join(root,name),os.path.join(root,replace))    

如果您注释掉 for ... loops 并且仅 print(root,dirs,files),则输出:

./             ['Config', 'Template', 'TemplateTest'] ['main.py']
./Config       []                                     ['TemplateConfig.txt']
./Template     []                                     ['Template1.txt', 'Template2.txt']
./TemplateTest []                                     ['TemplateTest1.txt']

如果您再次注释 for 循环并将重命名替换为 print,您将得到:

renaming:  ./Template  to  ./MyApp            # aha - works
renaming:  ./TemplateTest  to  ./MyAppTest    # aha - works 
renaming:  ./Config/TemplateConfig.txt  to  ./Config/MyAppConfig.txt   # works
renaming:  ./Template/Template1.txt  to  ./Template/MyApp1.txt       # folder not updated
renaming:  ./Template/Template2.txt  to  ./Template/MyApp2.txt       # folder also not updated
renaming:  ./TemplateTest/TemplateTest1.txt  to  ./TemplateTest/MyAppTest1.txt  # also not updated

如果您查看文档,它可能会说迭代 os.walk() 的生成结果时发生的更改不会反映在生成的数据中。

你基本上 "change a interable while iterating it" ;o)

来自链接的独库:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

(注意 os.walk 的调用签名是:

os.walk = walk(top, topdown=True, onerror=None, followlinks=False)

所以你超过了 TrueNoneFalse。)

问题与 os.walk 遍历目录和文件的顺序以及它遍历的目录和文件有关。

特别是,它从读取 path 处的目录开始。这会产生以下内容:

['Template', 'TemplateTest', 'Config']

所有这些都是目录,所以下次它要走的子目录列表是一样的,而且没有文件。这在第一次迭代中作为三个值返回:

path
['Template', 'TemplateTest', 'Config']
[]

然后您编写自己的代码,其中您在 Template 上调用 os.rename,因此它现在被命名为 MyApp,并在 TemplateTest 上调用,因此目录现在命名为 MyAppTest.

接下来,os.walk 代码尝试读取子目录 Template。这失败了,所以什么也没有发生(onerrorNone)。

接下来,os.walk 代码尝试读取子目录 TemplateTest。这失败了,所以什么也没有发生。

最后,os.walk 代码尝试读取子目录 Config。这成功了,一切顺利。

有两种不同的解决方案:您可以将 topdown 设置为 False,或者您可以更新名为 dirs 的列表,以便 os.walk 知道 [=63] =]new 目录名称。 (编辑:我不确定 topdown=False 会修复它;那需要测试。)

(编辑:topdown=False 真的会修复它。这在文档中有描述:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

)