R 和 Python 之间加权 t 检验的不同结果
Different results with weighted t test between R and Python
我是 运行 Python 中的加权 t 检验,我看到了不同的结果。看来问题是自由度计算。想了解为什么我看到不同的输出。
这是一些示例代码。
在 R 中:
library(weights)
x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
weightsa = c(rep(1,8), rep(2,8))
weightsb = c(rep(2,8), rep(1,8))
wtd.t.test(x = x,
y = y,
weight = weightsa,
weighty = weightsb, samedata=F)
$test [1] "Two Sample Weighted T-Test (Welch)"
$coefficients
t.value df p.value
-1.88907197 29.93637837 0.06860382
$additional Difference Mean.x Mean.y Std. Err -80.50000
267.12500 347.62500 42.61352
在Python中:
import numpy as np
from statsmodels.stats.weightstats import ttest_ind
x = np.asarray([373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260])
y = np.asarray([411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303])
weightsa = [1] * 8 + [2] * 8
weightsb = [2] * 8 + [1] * 8
ttest_ind(x, y, usevar='unequal', weights=(weightsa, weightsb))
(-2.3391969704691085, 0.023733058922455107, 45.90244683439944)
P 值在 R 中为 .06,在 Python 中为 .02。
R 源代码使用 Satterthwaite 公式计算自由度:
df <- (((vx/n) + (vy/n2))^2)/((((vx/n)^2)/(n - 1)) +
((vy/n2)^2/(n2 - 1)))
Python函数源代码也声称使用了这个公式:
def dof_satt(self):
'''degrees of freedom of Satterthwaite for unequal variance
'''
d1 = self.d1
d2 = self.d2
#this follows blindly the SPSS manual
#except I use ``_var`` which has ddof=0
sem1 = d1._var / (d1.nobs-1)
sem2 = d2._var / (d2.nobs-1)
semsum = sem1 + sem2
z1 = (sem1 / semsum)**2 / (d1.nobs - 1)
z2 = (sem2 / semsum)**2 / (d2.nobs - 1)
dof = 1. / (z1 + z2)
return dof
这里的分子看起来一样,但分母看起来很不一样。
你在这里遇到的问题是 weights::wtd.t.test()
有一个(对我来说很奇怪的)默认参数 mean1 = TRUE
,它控制 "whether the weights should be forced to have an average value of 1"(来自 help("wtd.t.test")
)。
如果我们使用 mean1 = FALSE
,我们得到与 ttest_ind()
相同的行为:
wtd.t.test(x = x,
y = y,
weight = weightsa,
weighty = weightsb,
samedata = FALSE,
mean1 = FALSE)
$test
[1] "Two Sample Weighted T-Test (Welch)"
$coefficients
t.value df p.value
-2.33919697 45.90244683 0.02373306
$additional
Difference Mean.x Mean.y Std. Err
-80.50000 267.12500 347.62500 34.41352
我是 运行 Python 中的加权 t 检验,我看到了不同的结果。看来问题是自由度计算。想了解为什么我看到不同的输出。
这是一些示例代码。
在 R 中:
library(weights)
x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
weightsa = c(rep(1,8), rep(2,8))
weightsb = c(rep(2,8), rep(1,8))
wtd.t.test(x = x,
y = y,
weight = weightsa,
weighty = weightsb, samedata=F)
$test [1] "Two Sample Weighted T-Test (Welch)"
$coefficients
t.value df p.value
-1.88907197 29.93637837 0.06860382
$additional Difference Mean.x Mean.y Std. Err -80.50000
267.12500 347.62500 42.61352
在Python中:
import numpy as np
from statsmodels.stats.weightstats import ttest_ind
x = np.asarray([373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260])
y = np.asarray([411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303])
weightsa = [1] * 8 + [2] * 8
weightsb = [2] * 8 + [1] * 8
ttest_ind(x, y, usevar='unequal', weights=(weightsa, weightsb))
(-2.3391969704691085, 0.023733058922455107, 45.90244683439944)
P 值在 R 中为 .06,在 Python 中为 .02。
R 源代码使用 Satterthwaite 公式计算自由度:
df <- (((vx/n) + (vy/n2))^2)/((((vx/n)^2)/(n - 1)) +
((vy/n2)^2/(n2 - 1)))
Python函数源代码也声称使用了这个公式:
def dof_satt(self):
'''degrees of freedom of Satterthwaite for unequal variance
'''
d1 = self.d1
d2 = self.d2
#this follows blindly the SPSS manual
#except I use ``_var`` which has ddof=0
sem1 = d1._var / (d1.nobs-1)
sem2 = d2._var / (d2.nobs-1)
semsum = sem1 + sem2
z1 = (sem1 / semsum)**2 / (d1.nobs - 1)
z2 = (sem2 / semsum)**2 / (d2.nobs - 1)
dof = 1. / (z1 + z2)
return dof
这里的分子看起来一样,但分母看起来很不一样。
你在这里遇到的问题是 weights::wtd.t.test()
有一个(对我来说很奇怪的)默认参数 mean1 = TRUE
,它控制 "whether the weights should be forced to have an average value of 1"(来自 help("wtd.t.test")
)。
如果我们使用 mean1 = FALSE
,我们得到与 ttest_ind()
相同的行为:
wtd.t.test(x = x,
y = y,
weight = weightsa,
weighty = weightsb,
samedata = FALSE,
mean1 = FALSE)
$test
[1] "Two Sample Weighted T-Test (Welch)"
$coefficients
t.value df p.value
-2.33919697 45.90244683 0.02373306
$additional
Difference Mean.x Mean.y Std. Err
-80.50000 267.12500 347.62500 34.41352