如何使用 tapply 对因子的每个级别执行 t 检验
How to perform t-tests for each level of a factor with tapply
我的数据和代码是这样的:
my_vector <- rnorm(150)
my_factor1 <- gl(3,50)
my_factor2 <- gl(2,75)
tapply(my_vector, my_factor1, function(x)
t.test(my_vector~my_factor2, paired=T))
我想对 my_factor1 的每个水平进行单独的 t 检验,以测试 my_factor2 两个水平的 my_vector。
但是,对于我的代码,t 检验并未拆分 my_factor1 的级别,并且每个级别的结果都相等,因为 my_vector 完全包含在每个 t.test 中.
这是我的代码的输出:
$`1`
Paired t-test
data: my_vector by my_factor2
t = 0.2448, df = 74, p-value = 0.8073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2866512 0.3669667
sample estimates:
mean of the differences
0.04015775
$`2`
Paired t-test
data: my_vector by my_factor2
t = 0.2448, df = 74, p-value = 0.8073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2866512 0.3669667
sample estimates:
mean of the differences
0.04015775
$`3`
Paired t-test
data: my_vector by my_factor2
t = 0.2448, df = 74, p-value = 0.8073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2866512 0.3669667
sample estimates:
mean of the differences
0.04015775
我错过了什么或做错了什么?
你的例子有点问题,因为如果你设置:
df <- data.frame(my_vector = rnorm(150),
my_factor1 = gl(3,50),
my_factor2 = gl(2,75)
)
当 my_factor1
= 1 或 3 时,您将只有一个唯一值 my_factor2
,因为您的重复重叠方式。参见 ?gl
。也一样:
df <- data.frame(my_vector = rnorm(150),
my_factor1 = gl(3,1,150),
my_factor2 = gl(2,1,150)
)
with(df,
by(df, my_factor1,
function(x) t.test(my_vector ~ my_factor2, data=x)
)
)
这似乎产生了您想要的输出。
作为旁注——考虑对多重比较进行更正:https://stats.stackexchange.com/questions/16779/when-is-multiple-comparison-correction-necessary
我的数据和代码是这样的:
my_vector <- rnorm(150)
my_factor1 <- gl(3,50)
my_factor2 <- gl(2,75)
tapply(my_vector, my_factor1, function(x)
t.test(my_vector~my_factor2, paired=T))
我想对 my_factor1 的每个水平进行单独的 t 检验,以测试 my_factor2 两个水平的 my_vector。
但是,对于我的代码,t 检验并未拆分 my_factor1 的级别,并且每个级别的结果都相等,因为 my_vector 完全包含在每个 t.test 中.
这是我的代码的输出:
$`1`
Paired t-test
data: my_vector by my_factor2
t = 0.2448, df = 74, p-value = 0.8073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2866512 0.3669667
sample estimates:
mean of the differences
0.04015775
$`2`
Paired t-test
data: my_vector by my_factor2
t = 0.2448, df = 74, p-value = 0.8073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2866512 0.3669667
sample estimates:
mean of the differences
0.04015775
$`3`
Paired t-test
data: my_vector by my_factor2
t = 0.2448, df = 74, p-value = 0.8073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2866512 0.3669667
sample estimates:
mean of the differences
0.04015775
我错过了什么或做错了什么?
你的例子有点问题,因为如果你设置:
df <- data.frame(my_vector = rnorm(150),
my_factor1 = gl(3,50),
my_factor2 = gl(2,75)
)
当 my_factor1
= 1 或 3 时,您将只有一个唯一值 my_factor2
,因为您的重复重叠方式。参见 ?gl
。也一样:
df <- data.frame(my_vector = rnorm(150),
my_factor1 = gl(3,1,150),
my_factor2 = gl(2,1,150)
)
with(df,
by(df, my_factor1,
function(x) t.test(my_vector ~ my_factor2, data=x)
)
)
这似乎产生了您想要的输出。
作为旁注——考虑对多重比较进行更正:https://stats.stackexchange.com/questions/16779/when-is-multiple-comparison-correction-necessary