检测子目录列表中的多个文件 Python

Question

我有一个文件列表，我想检测它们是否存在于子目录中，我已经非常接近了，但我卡在了最后一步（第 5 步）。

采取的步骤

从提供的文本文件中获取文件名
将文件名保存为列表
遍历之前保存的文件名列表
遍历目录和子目录以确定文件是否存在
保存找到的第二个列表中的文件名

提供的文本文件有一个列表，例如：

testfile1.txt
testfile2.txt
testfile3.txt
testfile4.txt
testfile5.txt

其中只有 testfile1-4 实际上存在于（子）目录中。

预期输出是一个列表，例如 ['testfile1.txt'、'testfile2.txt'、'testfile3.txt'、'testfile4.txt']。

代码

import os.path
from os import path
import sys

file = sys.argv[1]
#top_dir = sys.argv[2]
cwd = os.getcwd()

with open(file, "r") as f: #Step 1
    file_list = []
    for line in f:
        file_name = line.strip()
        file_list.append(file_name) #Step 2
    print(file_list)
    for file in file_list: #Step 3
        detected_files = []
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            if file in files:
                print(file)
                print("Files Found")
                detected_files.append(file) #Step 5
                print(detected_files)

输出的内容：

Files Found
testfile1.txt
['testfile1.txt']
Files Found
testfile2.txt
['testfile2.txt']
Files Found
testfile3.txt
['testfile3.txt']
Files Found
testfile4.txt
['testfile4.txt']

Answer 1

您当前的进程看起来像这样

with open(file, "r") as f: #Step 1
    ...
    for file in file_list: #Step 3
        detected_files = []
        ...
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            ...

您可以看到，在 for file in file_list: 的每次迭代中，您创建了一个新的空 detected_files 列表 - 丢失了之前保存.

detected_files应该做一次

detected_files = []
with open(file, "r") as f: #Step 1
    ...
    for file in file_list: #Step 3
        ...
        for dir, sub_dirs, files in os.walk(cwd): #Step 4
            ...

我会使用 set for membership testing 并将所有找到的文件名保存在一个集合中（以避免重复）。

detected_files = set()
with open(file, "r") as f: #Step 1
    file_list = set(line.strip() for line in f)
for dir, sub_dirs, files in os.walk(cwd): #Step 4
    found = file_list.intersection(files)
    detected_files.update(found)

如果您愿意，可以 short-circuit 如果找到所有文件，则该过程。

for dir, sub_dirs, files in os.walk(cwd): #Step 4
    found = file_list.intersection(files)
    detected_files.update(found)
    if detected_files == file_list: break

检测子目录列表中的多个文件 Python

Detecting Multiple Files from List within Subdirectories Python

python

loops

os.walk

代码