编写一个函数,对 DataFrame 列表执行多个学生 t 检验
Write a function that performs multiple student t-test for a list of DataFrames
我有这个数据框:
print(TempvsDType)
CurrentThermostatTemp
DwellingType
Bungalow 0.0
Bungalow 22.0
Bungalow 22.0
Bungalow 25.0
Bungalow 18.0
Bungalow 21.0
Bungalow 22.0
Bungalow 10.0
Bungalow 18.0
Bungalow 20.0
Bungalow 20.0
Bungalow 22.0
Bungalow 20.0
Bungalow 10.0
Bungalow 30.0
Bungalow 22.0
Bungalow 20.0
Bungalow 20.0
Bungalow 19.0
Bungalow 20.0
Bungalow 22.0
Bungalow 20.0
Bungalow 21.0
Bungalow 22.0
Bungalow 15.0
Bungalow 22.0
Bungalow 0.0
Bungalow 24.0
Bungalow 30.0
Bungalow 20.0
... ...
Park Home 20.0
Park Home 23.0
Park Home 20.0
Park Home 20.0
Park Home 20.0
Park Home 18.0
Park Home 20.0
Park Home 15.0
Park Home 12.0
Park Home 20.0
Park Home 20.0
Park Home 23.0
Park Home 21.0
Park Home 20.0
Park Home 20.0
Park Home 20.0
Park Home 23.0
Park Home 18.0
Park Home 20.0
Park Home 18.0
Park Home 16.0
Park Home 17.0
Park Home 20.0
Park Home 20.0
Park Home 18.0
Park Home 18.0
Park Home 20.0
Park Home 20.0
Park Home 15.0
Park Home 21.0
[6247 rows x 1 columns]
我已经用 .truncate() 方法分隔了每个变量:
Flat = TempvsDType.truncate(before="Flat",after="Flat")
House = TempvsDType.truncate(before="House",after="House")
Bungalow = TempvsDType.truncate(before="Bungalow",after="Bungalow")
Maisonette = TempvsDType.truncate(before="Maisonette",after="Maisonette")
ParkHome = TempvsDType.truncate(before="Park Home",after="Park Home")
我的目标是对变量之间所有可能的组合执行学生 t 检验,重复或重复对除外。但是,我不得不手动执行此操作,这非常耗时且耗时,尤其是对于其他变量超过 5 个且组合数量大幅增加的脚本。这是我的手动方法:
from scipy.stats import ttest_ind
#All possible combinations:
Flat_House = ttest_ind(Flat,House)
Flat_Bungalow = ttest_ind(Flat,Bungalow)
Flat_Maisonette = ttest_ind(Flat,Maisonette)
Flat_ParkHome = ttest_ind(Flat,ParkHome)
House_Bungalow = ttest_ind(House,Bungalow)
House_Maisonette = ttest_ind(House,Maisonette)
House_ParkHome = ttest_ind(House,ParkHome)
Bungalow_Maisonette = ttest_ind(Bungalow,Maisonette)
Bungalow_ParkHome = ttest_ind(Bungalow,ParkHome)
Maisonette_ParkHome = ttest_ind(Maisonette, ParkHome)
#t-test between each combination
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[1],Flat_House[0],Flat_House[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[2],Flat_Bungalow[0],Flat_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[3],Flat_Maisonette[0],Flat_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[4],Flat_ParkHome[0],Flat_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[2],House_Bungalow[0],House_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[3],House_Maisonette[0],House_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[4],House_ParkHome[0],House_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[3],Bungalow_Maisonette[0],Bungalow_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[4],Bungalow_ParkHome[0],Bungalow_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[3],u[4],Maisonette_ParkHome[0],Maisonette_ParkHome[1]))
因此,我想知道如何编写一个自动执行此操作的函数,即打印所有可能组合的学生 t 检验,除了重复项和现有对,并且 return 按照我的方式进行手动打印。我已经尝试了很多次,但没有 succeeded.I 如果有人能帮助我,我会很高兴。谢谢。
from itertools import combinations
from scipy.stats import ttest_ind
dfs = dict(tuple(TempvsDType.drop_duplicates(inplace=False).groupby('DwellingType'))) # drop duplicate rows, and create a dictionary of dataframes after grouping by DwellingType
def ttest(pair):
results= ttest_ind(dfs[pair[0]]['CurrentThermostatTemp'], dfs[pair[1]]['CurrentThermostatTemp'])
print(f"t-test between {pair[0]} and {pair[1]} is {results[0]} and p-value: {results[1]}")
all_combinations = list(combinations(list(dfs.keys()), 2)) # find all combinations in the keys of the dict with dataframes
[ttest(i) for i in all_combinations] # pass all combinations through the function ttest
输出:
t-test between Bungalow and Park Home is 0.2594309721800956 and p-value: 0.7984182890048678
我有这个数据框:
print(TempvsDType)
CurrentThermostatTemp
DwellingType
Bungalow 0.0
Bungalow 22.0
Bungalow 22.0
Bungalow 25.0
Bungalow 18.0
Bungalow 21.0
Bungalow 22.0
Bungalow 10.0
Bungalow 18.0
Bungalow 20.0
Bungalow 20.0
Bungalow 22.0
Bungalow 20.0
Bungalow 10.0
Bungalow 30.0
Bungalow 22.0
Bungalow 20.0
Bungalow 20.0
Bungalow 19.0
Bungalow 20.0
Bungalow 22.0
Bungalow 20.0
Bungalow 21.0
Bungalow 22.0
Bungalow 15.0
Bungalow 22.0
Bungalow 0.0
Bungalow 24.0
Bungalow 30.0
Bungalow 20.0
... ...
Park Home 20.0
Park Home 23.0
Park Home 20.0
Park Home 20.0
Park Home 20.0
Park Home 18.0
Park Home 20.0
Park Home 15.0
Park Home 12.0
Park Home 20.0
Park Home 20.0
Park Home 23.0
Park Home 21.0
Park Home 20.0
Park Home 20.0
Park Home 20.0
Park Home 23.0
Park Home 18.0
Park Home 20.0
Park Home 18.0
Park Home 16.0
Park Home 17.0
Park Home 20.0
Park Home 20.0
Park Home 18.0
Park Home 18.0
Park Home 20.0
Park Home 20.0
Park Home 15.0
Park Home 21.0
[6247 rows x 1 columns]
我已经用 .truncate() 方法分隔了每个变量:
Flat = TempvsDType.truncate(before="Flat",after="Flat")
House = TempvsDType.truncate(before="House",after="House")
Bungalow = TempvsDType.truncate(before="Bungalow",after="Bungalow")
Maisonette = TempvsDType.truncate(before="Maisonette",after="Maisonette")
ParkHome = TempvsDType.truncate(before="Park Home",after="Park Home")
我的目标是对变量之间所有可能的组合执行学生 t 检验,重复或重复对除外。但是,我不得不手动执行此操作,这非常耗时且耗时,尤其是对于其他变量超过 5 个且组合数量大幅增加的脚本。这是我的手动方法:
from scipy.stats import ttest_ind
#All possible combinations:
Flat_House = ttest_ind(Flat,House)
Flat_Bungalow = ttest_ind(Flat,Bungalow)
Flat_Maisonette = ttest_ind(Flat,Maisonette)
Flat_ParkHome = ttest_ind(Flat,ParkHome)
House_Bungalow = ttest_ind(House,Bungalow)
House_Maisonette = ttest_ind(House,Maisonette)
House_ParkHome = ttest_ind(House,ParkHome)
Bungalow_Maisonette = ttest_ind(Bungalow,Maisonette)
Bungalow_ParkHome = ttest_ind(Bungalow,ParkHome)
Maisonette_ParkHome = ttest_ind(Maisonette, ParkHome)
#t-test between each combination
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[1],Flat_House[0],Flat_House[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[2],Flat_Bungalow[0],Flat_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[3],Flat_Maisonette[0],Flat_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[4],Flat_ParkHome[0],Flat_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[2],House_Bungalow[0],House_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[3],House_Maisonette[0],House_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[4],House_ParkHome[0],House_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[3],Bungalow_Maisonette[0],Bungalow_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[4],Bungalow_ParkHome[0],Bungalow_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[3],u[4],Maisonette_ParkHome[0],Maisonette_ParkHome[1]))
因此,我想知道如何编写一个自动执行此操作的函数,即打印所有可能组合的学生 t 检验,除了重复项和现有对,并且 return 按照我的方式进行手动打印。我已经尝试了很多次,但没有 succeeded.I 如果有人能帮助我,我会很高兴。谢谢。
from itertools import combinations
from scipy.stats import ttest_ind
dfs = dict(tuple(TempvsDType.drop_duplicates(inplace=False).groupby('DwellingType'))) # drop duplicate rows, and create a dictionary of dataframes after grouping by DwellingType
def ttest(pair):
results= ttest_ind(dfs[pair[0]]['CurrentThermostatTemp'], dfs[pair[1]]['CurrentThermostatTemp'])
print(f"t-test between {pair[0]} and {pair[1]} is {results[0]} and p-value: {results[1]}")
all_combinations = list(combinations(list(dfs.keys()), 2)) # find all combinations in the keys of the dict with dataframes
[ttest(i) for i in all_combinations] # pass all combinations through the function ttest
输出:
t-test between Bungalow and Park Home is 0.2594309721800956 and p-value: 0.7984182890048678