当文件共享非常相似的名称时如何使用 pathlib.glob() 遍历文件

Question

My Directory 看起来像这样：

P1_AAA_NOT_SAMPLE.csv
P1_AAA_SAMPLE.csv
P1_BBB_NOT_SAMPLE.csv
P1_BBB_SAMPLE.csv
P1_CCC_NOT_SAMPLE.csv
P1_CCC_SAMPLE.csv

P2_AAA_NOT_SAMPLE.csv
P2_AAA_SAMPLE.csv
P2_BBB_NOT_SAMPLE.csv
P2_BBB_SAMPLE.csv
P2_CCC_NOT_SAMPLE.csv
P2_CCC_SAMPLE.csv

如果我只想捕获 SAMPLE 文件（即，我不想要 NOT_SAMPLE 文件），如何使用 pathlib.glob() 遍历此目录中的文件。

我的代码如下所示：

from pathlib import Path

file_path = r'C:\Users\HP\Desktop\My Directory'

for fle in Path(file_path).glob('P*_*_SAMPLE.csv'):
    # do something with each SAMPLE file

但是此代码还将捕获 SAMPLE 文件和 NOT_SAMPLE 文件。有没有办法调整通配符或 glob() 部分以仅捕获 SAMPLE 文件，最好使用 pathlib?

提前致谢。

Answer 1

像这样，如果文件名中有“不”：做点什么。

在你的 for 循环之后，

for fle in Path(file_path).glob('P*_*_SAMPLE.csv'):
    if 'NOT' not in str(file):
        #do something

Answer 2

您可以在生成器表达式（或列表理解）中进行过滤，如下所示：

for fle in (p for p in Path(file_path).glob('P*_*_SAMPLE.csv') if 'NOT_SAMPLE' not in str(p)):

或之前建立列表：

valid_paths = [p for p in Path(file_path).glob('P*_*_SAMPLE.csv') if 'NOT_SAMPLE' not in str(p)]

for fle in valid_paths:

当文件共享非常相似的名称时如何使用 pathlib.glob() 遍历文件

How to iterate through files using pathlib.glob() when files share very similar names

python

glob

pathlib