对文件夹中每个可能的文件组合进行统计测试

Question

我有一个包含大约 100 个 csv 文件的文件夹。我想对每种可能的文件组合使用两次采样的 Kolmogorov-Smirnov 测试。我可以像这样手动执行此操作：

import pandas as pd 
import scipy as sp

df=pd.read_csv(r'file1.csv')
df2=pd.read_csv(r'file2.csv')
sp.stats.ks_2samp(df, df2)

但我不想手动分配所有变量。有没有办法遍历文件并使用统计测试比较所有可能的组合？

Answer 1

听起来你想得到文件名列表本身的笛卡尔积。

Cartesian product of lists in python

在你的实现中，你应该有一个列表中所有文件名的列表，然后调用

itertools.product(files, files)

在 documentation for itertools.product 中提到它与

相同

((x,y) for x in A for y in B)

Performing statistical test on every possible file combination in a folder