遍历一个字典，做同样的事情，如何优化？

Question

我有一个文件字典，其结构如下所示

+-- folder1
| +-- folder2
    | +--A.py
    | +--A.txt
| +-- folder3 
    | +--folder4
        | +--B.py
        | +--B.txt
| +-- C.py
| +-- C.txt

我想知道的是找到folder1中的所有.py个文件，并写出_连接的相对路径。例如，B.py 可以是 folder1_folder3_folder4_B.py。这就是我所做的。

import os
file_list = os.listdir(folder1)
for file in file_list:
    if len(file.split('.')) ==1 and file.split('.')[-1]=='py': # C.py
       print(folder1 + file) 
    elif len(file.split('.')) ==1 and file.split('.')[-1]!='py':  # C.txt
       pass
    else:
       file1_list = os.listdir(file):
       for file1 in file1_list:
           if len(file1.split('.')) ==1 and file1.split('.')[-1]=='py': # A.py
               print(folder1 + file + file1) 
           elif len(file1.split('.')) ==1 and file1.split('.')[-1]!='py':  # A.txt
               pass
           else:
               file2_list = os.listdir(file1):
               for file2 in file2_list:
                   if len(file2.split('.')) ==1 and file2.split('.')[-1]=='py': # B.py
                       print(folder1 + file + file1 + file2) 
                   elif len(file2.split('.')) ==1 and file2.split('.')[-1]!='py':  # B.txt
                       pass
                   else: 
                       pass # Actually I dont know how to write

有两个缺点：

(1) 虽然我可以获得 folder1

的最大深度，但我不知道何时停止 for 循环

(2)for循环重复操作太多，显然可以优化

有人有好的答案吗？

Answer 1

您想使用递归，这是函数调用自身的奇特名称。编写一个将文件夹名称作为参数的函数。它应该运行 os.listdir() 并循环遍历结果，就像您正在做的那样。当您到达子文件夹时，只需再次运行函数！

此外，检查 .endswith() 函数，它比所有拆分都容易。你可以直接问 if file.endswith('.py').

Answer 2

os.walk 递归遍历目录树。 fnmatch.fnmatch 可以通配符匹配文件名。 os.path.relpath 可以将复杂的根路径限制为仅子文件夹的路径。

给定 testdir:

C:\TESTDIR
└───folder1
    │   C.py
    │   C.txt
    ├───folder2
    │       A.py
    │       A.txt
    └───folder3
        └───folder4
                B.py
                B.txt

和代码：

import os
from fnmatch import fnmatch

def magic(root):
    for path,dirs,files in os.walk(root):
        # fixes paths that start with .
        relpath = '' if root == path else os.path.relpath(path,root)
        for file in files:
            if fnmatch(file,'*.py'):
                name = os.path.join(relpath,file)
                yield name.replace(os.path.sep,'_')

root = r'.\testdir' # A path that starts with . for testing

for name in magic(root):
    print(name)

输出：

folder1_C.py
folder1_folder2_A.py
folder1_folder3_folder4_B.py

如果文件名包含下划线，您应该考虑要发生的情况，但是

遍历一个字典，做同样的事情，如何优化？

Traverse a dictionary and do the same thing, how to optimize?

python

for-loop

listdir