如何使用 pathlib.Path().glob() 遍历目录并在每次迭代中读入 2 个文件

How to iterate through a directory and read in 2 files at each iteration using pathlib.Path().glob()

使用pathlib.Path().glob(),我们如何迭代一个目录并在每次迭代时读入2个文件?

假设我的目录 C:\Users\server\Desktop\Dataset 如下所示:

P1_mean_fle.csv
P2_mean_fle.csv
P3_mean_fle.csv
P1_std_dev_fle.csv
P2_std_dev_fle.csv
P3_std_dev_fle.csv

如果我只想在 Pi 的每次迭代中读入 1 个文件,我的代码将如下所示:

from pathlib import Path
import pandas as pd

file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'

for i, fle in enumerate(Path(file_path).glob(param_file)):
    mean_fle = pd.read_csv(fle).values

    results = tuning(mean_fle)  #tuning is some function which takes in the file mean 
                                #and does something with this file

现在,我如何在 Pi 的每次迭代中读入 2 个文件?下面的代码不太有效,因为 param_file 只能分配一种文件名类型。如果有办法使用 pathlib.

做到这一点,我们将不胜感激
from pathlib import Path
import pandas as pd

param_file = 'P*' + '_mean_fle.csv'
param_file = 'P*' + '_std_dev_fle.csv'  #this is wrong

for i, fle in enumerate(Path(file_path).glob(param_file)):  #this is wrong inside the glob() part
    mean_fle = pd.read_csv(fle).values
    std_dev_fle = pd.read_csv(fle).values

    results = tuning(mean_fle, std_dev_fle)  #tuning is some function which takes in the two files mean 
                                             #and std_dev and does something with these 2 files

提前致谢。

我建议你两种方法:

1.

如果你确定你所有的文件都没有编号 'holes',你可以不带 'glob':

mean_csv_pattern = 'P{}_mean_fle.csv'
std_dev_pattern = 'P{}_std_dev_fle.csv'

i = 0
while True:
    i += 1
    try:
        mean_fle = pd.read_csv(mean_csv_pattern.format(i)).values
        std_dev_fle = pd.read_csv(std_dev_pattern.format(i)).values
    except (<put your exceptions here>):
        break
    results = tuning(mean_fle, std_dev_fle)

2.

使用预取操作获取所有文件并将它们放入可在主循环中查询的结构中。

Glob 用于平均文件,glob 用于 std_dev 文件,从文件名中获取数字并生成字典 {index: {'mean_file': mean_file, 'std_file' : std_file)} 然后循环排序的字典键...

如果您的文件名像示例中那样遵循确定性规则,那么最好的办法是迭代一种文件,并通过字符串替换找到相应的文件。

from pathlib import Path
import pandas as pd

file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'

for i, fle in enumerate(Path(file_path).glob(param_file)):
    stddev_fle = fle.with_name(fle.name.replace("mean", "std_dev"))
    mean_values = pd.read_csv(fle).values
    stddev_values = pd.read_csv(stddev_fle).values

    results = tuning(mean_values, stddev_values)