Python: for 循环没有遍历所有文件

Question

我正在尝试遍历一些压缩文件（扩展名为“.gz”），但我运行遇到了问题。我想在遇到以 'aa' 结尾的第一个文件时执行特定操作 - 它可以是随机的，不一定必须是列表中的第一个。只有这样，Python 才必须搜索文件夹中是否有其他“aa”文件，如果有，则必须应用第二条规则。（可能有 1 个到多个“aa”文件）。最后，第三条规则必须应用于所有其他不以“aa”结尾的文件。

但是，当我运行下面的代码时，并非所有文件都得到处理。

我做错了什么？

谢谢！

inputPath = "write your path"
fileExt = r".gz"
    flag = False
    
    for item in os.listdir(inputPath): # loop through items in dir
        if item.endswith(fileExt): # check for ".gz" extension
            full_path = os.path.join(inputPath, item) # get full path of files
            
            
            if item.endswith('aa' + fileExt) and flag == False:
                df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
    #           do something
                flag = True
                print('1 rule:', "The item processed is ", item)
             
            elif item.endswith('aa' + fileExt) and flag == True:
                df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
    #           do something else
                print('2 rule:', "The item processed is ", item)
    
            elif not (item.endswith('aa' + fileExt)) and flag == True:    
                df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
    #           do something else
                print('3 rule:', "The item processed is ", item)

我认为这是由于 Python 遍历按字母顺序排序的文件列表，然后忽略其他文件。我该如何解决这个问题？

LIST OF FILES:

File_202112311aa.gz
File_20211231ab.gz
File_20211231.gz
File_20211231aa.gz

OUTPUT
1 rule The item processed is  File_202112311aa.gz
3 rule The item processed is  File_20211231ab.gz
2 rule The item processed is  File_20211231aa.gz

Answer 1

大部分未经测试，但按照以下几行应该可以工作。

这段代码首先处理一个以'aa.gz'结尾的文件（注意：并非所有以'aa.gz'结尾的文件都被首先处理，因为问题中没有说明），然后处理剩余的文件.其余文件没有特定的顺序：这将取决于 Python 在系统上的构建方式，以及（文件）系统默认情况下的功能，并且无法保证。

# Obtain an unordered list of compressed files
filenames = glob.glob("*.gz")

# Now find a filename ending with 'aa.gz'
for i, filename in enumerate(filenames):
    if filename.endswith('aa.gz'):
        firstfile = filenames.pop(i)
        # We immediately break out of the loop, 
        # so we're safe to have altered `filenames`
        break
else:  
    # the sometimes useful and sometimes confusing else part 
    # of a for-loop: what happens if `break` was not called:
    raise ValueError("no file ending in 'aa.gz' found!")

# Ignoring the `full_path` part
df = pd.read_csv(firstfile, compression='gzip', header=0, sep='|', encoding="ISO-8859-1")
# do something
print(f"1 rule: The file processed is {firstfile}")
          
# Process the remaining files
for filename in filenames:
    df = pd.read_csv(filename, compression='gzip', header=0, sep='|', encoding="ISO-8859-1")
    if filename.endswith('aa.gz'):
        # do something
        print(f"2 rule: The file processed is {filename}")
    else:
        # do something else
        print(f"3 rule: The file processed is {filename}")

Answer 2

这里的其他人为您提供了更优化的解决方案，但这是为了回答您最初的问题，即为什么没有处理所有文件。

在您的代码中，您有三个条件来处理文件：

这是一个 *aa.gz 文件，并且是找到的第一个文件
这是一个*aa.gz个文件，并且是找到的第二个或更多*aa.gz个文件。
这不是 *aa.gz 文件，并且找到了之前的 *aa.gz 文件。

因此它将跳过所有非*aa.gz 文件，直到遇到第一个文件。

Python: for 循环没有遍历所有文件

Python: for loop not looping over all the files

python

for-loop

ends-with