P 值在 Python 中的 F 检验
F-Test with P-value in Python
R 允许我们计算两个总体之间的 F 检验:
> d1 = c(2.5579227634, 1.7774243136, 2.0025207896, 1.9518876366, 0.0, 4.1984191803, 5.6170403364, 0.0)
> d2 = c(16.93800333, 23.2837045311, 1.2674791828, 1.0889208427, 1.0447584137, 0.8971380534, 0.0, 0.0)
> var.test(d1,d2)
F test to compare two variances
data: d1 and d2
F = 0.0439, num df = 7, denom df = 7, p-value = 0.000523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.008789447 0.219288957
sample estimates:
ratio of variances
0.04390249
请注意,它还报告了 P 值。
另一个例子,R给出了这个:
> x1 = c(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 68.7169110318)
> x2 = c(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.1863361211)
> var.test(x1,x2)
#p-value = 1.223e-09
Python 中的等价物是什么?
我检查了这个 documentation,但似乎没有我想要的。
此代码给出了不同的 P 值(尤其是示例 2):
import statistics as stats
import scipy.stats as ss
def Ftest_pvalue(d1,d2):
"""docstring for Ftest_pvalue"""
df1 = len(d1) - 1
df2 = len(d2) - 1
F = stats.variance(d1) / stats.variance(d2)
single_tailed_pval = ss.f.cdf(F,df1,df2)
double_tailed_pval = single_tailed_pval * 2
return double_tailed_pval
Python 给了这个:
In [45]: d1 = [2.5579227634, 1.7774243136, 2.0025207896, 1.9518876366, 0.0, 4.1984191803, 5.6170403364, 0.0]
In [20]: d2 = [16.93800333, 23.2837045311, 1.2674791828, 1.0889208427, 1.0447584137, 0.8971380534, 0.0, 0.0]
In [64]: x1 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 68.7169110318]
In [65]: x2 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.1863361211]
In [69]: Ftest_pvalue(d1,d2)
Out[69]: 0.00052297887612346176
In [70]: Ftest_pvalue(x1,x2)
Out[70]: 1.9999999987772916
一个rpy2实现:
import rpy2.robjects as robjects
def Ftest_pvalue_rpy2(d1,d2):
"""docstring for Ftest_pvalue_rpy2"""
rd1 = (robjects.FloatVector(d1))
rd2 = (robjects.FloatVector(d2))
rvtest = robjects.r['var.test']
return rvtest(rd1,rd2)[2][0]
结果如下:
In [4]: x1 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 68.7169110318]
In [5]: x2 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.1863361211]
In [6]: Ftest_pvalue_rpy2(x1,x2)
Out[6]: 1.2227086010341282e-09
我应该提到 xalglib 是一个充满统计方法的包,允许这样做:
http://www.alglib.net/
http://www.alglib.net/hypothesistesting/variancetests.php
虽然它不如基于 scipy.
的原始方法灵活
我应该提一下,可以找到正确的双尾计算程序(在 variancetests.c 中)为:
stat = ae_minreal(xvar/yvar, yvar/xvar, _state);
*bothtails = 1-(fdistribution(df1, df2, 1/stat, _state)-fdistribution(df1, df2, stat, _state))
而@Amit Kumar Gupta 在他的评论中描述的内容是错误的(如果你只是将 1 和单边 p 值之间的差异加倍,你可以获得高于 1 的值)
R 允许我们计算两个总体之间的 F 检验:
> d1 = c(2.5579227634, 1.7774243136, 2.0025207896, 1.9518876366, 0.0, 4.1984191803, 5.6170403364, 0.0)
> d2 = c(16.93800333, 23.2837045311, 1.2674791828, 1.0889208427, 1.0447584137, 0.8971380534, 0.0, 0.0)
> var.test(d1,d2)
F test to compare two variances
data: d1 and d2
F = 0.0439, num df = 7, denom df = 7, p-value = 0.000523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.008789447 0.219288957
sample estimates:
ratio of variances
0.04390249
请注意,它还报告了 P 值。
另一个例子,R给出了这个:
> x1 = c(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 68.7169110318)
> x2 = c(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.1863361211)
> var.test(x1,x2)
#p-value = 1.223e-09
Python 中的等价物是什么? 我检查了这个 documentation,但似乎没有我想要的。
此代码给出了不同的 P 值(尤其是示例 2):
import statistics as stats
import scipy.stats as ss
def Ftest_pvalue(d1,d2):
"""docstring for Ftest_pvalue"""
df1 = len(d1) - 1
df2 = len(d2) - 1
F = stats.variance(d1) / stats.variance(d2)
single_tailed_pval = ss.f.cdf(F,df1,df2)
double_tailed_pval = single_tailed_pval * 2
return double_tailed_pval
Python 给了这个:
In [45]: d1 = [2.5579227634, 1.7774243136, 2.0025207896, 1.9518876366, 0.0, 4.1984191803, 5.6170403364, 0.0]
In [20]: d2 = [16.93800333, 23.2837045311, 1.2674791828, 1.0889208427, 1.0447584137, 0.8971380534, 0.0, 0.0]
In [64]: x1 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 68.7169110318]
In [65]: x2 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.1863361211]
In [69]: Ftest_pvalue(d1,d2)
Out[69]: 0.00052297887612346176
In [70]: Ftest_pvalue(x1,x2)
Out[70]: 1.9999999987772916
一个rpy2实现:
import rpy2.robjects as robjects
def Ftest_pvalue_rpy2(d1,d2):
"""docstring for Ftest_pvalue_rpy2"""
rd1 = (robjects.FloatVector(d1))
rd2 = (robjects.FloatVector(d2))
rvtest = robjects.r['var.test']
return rvtest(rd1,rd2)[2][0]
结果如下:
In [4]: x1 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 68.7169110318]
In [5]: x2 = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.1863361211]
In [6]: Ftest_pvalue_rpy2(x1,x2)
Out[6]: 1.2227086010341282e-09
我应该提到 xalglib 是一个充满统计方法的包,允许这样做: http://www.alglib.net/ http://www.alglib.net/hypothesistesting/variancetests.php 虽然它不如基于 scipy.
的原始方法灵活我应该提一下,可以找到正确的双尾计算程序(在 variancetests.c 中)为:
stat = ae_minreal(xvar/yvar, yvar/xvar, _state); *bothtails = 1-(fdistribution(df1, df2, 1/stat, _state)-fdistribution(df1, df2, stat, _state))
而@Amit Kumar Gupta 在他的评论中描述的内容是错误的(如果你只是将 1 和单边 p 值之间的差异加倍,你可以获得高于 1 的值)