如何使用 pathlib.Path().glob() 遍历目录并在每次迭代中读入 2 个文件
How to iterate through a directory and read in 2 files at each iteration using pathlib.Path().glob()
使用pathlib.Path().glob(),我们如何迭代一个目录并在每次迭代时读入2个文件?
假设我的目录 C:\Users\server\Desktop\Dataset
如下所示:
P1_mean_fle.csv
P2_mean_fle.csv
P3_mean_fle.csv
P1_std_dev_fle.csv
P2_std_dev_fle.csv
P3_std_dev_fle.csv
如果我只想在 Pi 的每次迭代中读入 1 个文件,我的代码将如下所示:
from pathlib import Path
import pandas as pd
file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'
for i, fle in enumerate(Path(file_path).glob(param_file)):
mean_fle = pd.read_csv(fle).values
results = tuning(mean_fle) #tuning is some function which takes in the file mean
#and does something with this file
现在,我如何在 Pi 的每次迭代中读入 2 个文件?下面的代码不太有效,因为 param_file
只能分配一种文件名类型。如果有办法使用 pathlib
.
做到这一点,我们将不胜感激
from pathlib import Path
import pandas as pd
param_file = 'P*' + '_mean_fle.csv'
param_file = 'P*' + '_std_dev_fle.csv' #this is wrong
for i, fle in enumerate(Path(file_path).glob(param_file)): #this is wrong inside the glob() part
mean_fle = pd.read_csv(fle).values
std_dev_fle = pd.read_csv(fle).values
results = tuning(mean_fle, std_dev_fle) #tuning is some function which takes in the two files mean
#and std_dev and does something with these 2 files
提前致谢。
我建议你两种方法:
1.
如果你确定你所有的文件都没有编号 'holes',你可以不带 'glob':
mean_csv_pattern = 'P{}_mean_fle.csv'
std_dev_pattern = 'P{}_std_dev_fle.csv'
i = 0
while True:
i += 1
try:
mean_fle = pd.read_csv(mean_csv_pattern.format(i)).values
std_dev_fle = pd.read_csv(std_dev_pattern.format(i)).values
except (<put your exceptions here>):
break
results = tuning(mean_fle, std_dev_fle)
2.
使用预取操作获取所有文件并将它们放入可在主循环中查询的结构中。
Glob 用于平均文件,glob 用于 std_dev 文件,从文件名中获取数字并生成字典 {index: {'mean_file': mean_file, 'std_file' : std_file)}
然后循环排序的字典键...
如果您的文件名像示例中那样遵循确定性规则,那么最好的办法是迭代一种文件,并通过字符串替换找到相应的文件。
from pathlib import Path
import pandas as pd
file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'
for i, fle in enumerate(Path(file_path).glob(param_file)):
stddev_fle = fle.with_name(fle.name.replace("mean", "std_dev"))
mean_values = pd.read_csv(fle).values
stddev_values = pd.read_csv(stddev_fle).values
results = tuning(mean_values, stddev_values)
使用pathlib.Path().glob(),我们如何迭代一个目录并在每次迭代时读入2个文件?
假设我的目录 C:\Users\server\Desktop\Dataset
如下所示:
P1_mean_fle.csv
P2_mean_fle.csv
P3_mean_fle.csv
P1_std_dev_fle.csv
P2_std_dev_fle.csv
P3_std_dev_fle.csv
如果我只想在 Pi 的每次迭代中读入 1 个文件,我的代码将如下所示:
from pathlib import Path
import pandas as pd
file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'
for i, fle in enumerate(Path(file_path).glob(param_file)):
mean_fle = pd.read_csv(fle).values
results = tuning(mean_fle) #tuning is some function which takes in the file mean
#and does something with this file
现在,我如何在 Pi 的每次迭代中读入 2 个文件?下面的代码不太有效,因为 param_file
只能分配一种文件名类型。如果有办法使用 pathlib
.
from pathlib import Path
import pandas as pd
param_file = 'P*' + '_mean_fle.csv'
param_file = 'P*' + '_std_dev_fle.csv' #this is wrong
for i, fle in enumerate(Path(file_path).glob(param_file)): #this is wrong inside the glob() part
mean_fle = pd.read_csv(fle).values
std_dev_fle = pd.read_csv(fle).values
results = tuning(mean_fle, std_dev_fle) #tuning is some function which takes in the two files mean
#and std_dev and does something with these 2 files
提前致谢。
我建议你两种方法:
1.
如果你确定你所有的文件都没有编号 'holes',你可以不带 'glob':
mean_csv_pattern = 'P{}_mean_fle.csv'
std_dev_pattern = 'P{}_std_dev_fle.csv'
i = 0
while True:
i += 1
try:
mean_fle = pd.read_csv(mean_csv_pattern.format(i)).values
std_dev_fle = pd.read_csv(std_dev_pattern.format(i)).values
except (<put your exceptions here>):
break
results = tuning(mean_fle, std_dev_fle)
2.
使用预取操作获取所有文件并将它们放入可在主循环中查询的结构中。
Glob 用于平均文件,glob 用于 std_dev 文件,从文件名中获取数字并生成字典 {index: {'mean_file': mean_file, 'std_file' : std_file)} 然后循环排序的字典键...
如果您的文件名像示例中那样遵循确定性规则,那么最好的办法是迭代一种文件,并通过字符串替换找到相应的文件。
from pathlib import Path
import pandas as pd
file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'
for i, fle in enumerate(Path(file_path).glob(param_file)):
stddev_fle = fle.with_name(fle.name.replace("mean", "std_dev"))
mean_values = pd.read_csv(fle).values
stddev_values = pd.read_csv(stddev_fle).values
results = tuning(mean_values, stddev_values)