在组可以更改的不同数据帧上执行方差分析的代码
Code to do ANOVA on different Dataframes where groups can change
我有以下数据框。但是它可以是那种格式的任何数据框。
df = pd.DataFrame({'Weight': [4.17,5.58,5.18,6.11,4.5,4.61,5.17,4.53,5.33,5.14,4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69,6.31,5.12,5.54,5.5,5.37,5.29,4.92,6.15,5.8,5.26],
'Group': ['A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C']
})
如何在不通过文本明确指定组的情况下执行方差分析来为我提供此 的 F 和 p 值?换句话说,是否有代码可以自动检测组和 运行 方差分析,以便它可以在该结构中的任何数据帧上工作,而不仅仅是这个?
为了检验各组组合之间均值的相似性,我们可以使用itertools.combinations
and scipy.stats.f_oneway
:
这里的 null hypothesis
引用自文档:
that two or more groups have the same population mean
场景 1:比较所有组:
from scipy.stats import f_oneway
grps = [d['Weight'] for _, d in df.groupby('Group')]
F, p = f_oneway(*grps)
print(F, p)
4.846087862380136 0.0159099583256229
场景 2:比较每个列组合:
from itertools import combinations
from scipy.stats import f_oneway
combs = list(combinations(df['Group'].unique(), 2))
for g1, g2 in combs:
a = f_oneway(df.loc[df['Group'] == g1, 'Weight'],
df.loc[df['Group'] == g2, 'Weight'])
print(f'For groups {g1} & {g2} the F-value is: {a[0]}, the p-value is: {a[1]}')
输出
For groups A & B the F-value is: 1.4191012973623165, the p-value is: 0.24902316597300575
For groups A & C the F-value is: 4.554043294351827, the p-value is: 0.04685138491157386
For groups B & C the F-value is: 9.0606932332992, the p-value is: 0.007518426118219876
我有以下数据框。但是它可以是那种格式的任何数据框。
df = pd.DataFrame({'Weight': [4.17,5.58,5.18,6.11,4.5,4.61,5.17,4.53,5.33,5.14,4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69,6.31,5.12,5.54,5.5,5.37,5.29,4.92,6.15,5.8,5.26],
'Group': ['A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C']
})
如何在不通过文本明确指定组的情况下执行方差分析来为我提供此 的 F 和 p 值?换句话说,是否有代码可以自动检测组和 运行 方差分析,以便它可以在该结构中的任何数据帧上工作,而不仅仅是这个?
为了检验各组组合之间均值的相似性,我们可以使用itertools.combinations
and scipy.stats.f_oneway
:
这里的 null hypothesis
引用自文档:
that two or more groups have the same population mean
场景 1:比较所有组:
from scipy.stats import f_oneway
grps = [d['Weight'] for _, d in df.groupby('Group')]
F, p = f_oneway(*grps)
print(F, p)
4.846087862380136 0.0159099583256229
场景 2:比较每个列组合:
from itertools import combinations
from scipy.stats import f_oneway
combs = list(combinations(df['Group'].unique(), 2))
for g1, g2 in combs:
a = f_oneway(df.loc[df['Group'] == g1, 'Weight'],
df.loc[df['Group'] == g2, 'Weight'])
print(f'For groups {g1} & {g2} the F-value is: {a[0]}, the p-value is: {a[1]}')
输出
For groups A & B the F-value is: 1.4191012973623165, the p-value is: 0.24902316597300575
For groups A & C the F-value is: 4.554043294351827, the p-value is: 0.04685138491157386
For groups B & C the F-value is: 9.0606932332992, the p-value is: 0.007518426118219876