并行 wilcox.test 使用 group_by 并总结
Parallel wilcox.test using group_by and summarise
必须有一种 R-ly 方法可以使用 group_by 对多个观察结果并行调用 wilcox.test
。我花了很多时间阅读这篇文章,但仍然无法找到对 wilcox.test
的调用来完成这项工作。下面的示例数据和代码,使用 magrittr
管道和 summarize()
。
library(dplyr)
library(magrittr)
# create a data frame where x is the dependent variable, id1 is a category variable (here with five levels), and id2 is a binary category variable used for the two-sample wilcoxon test
df <- data.frame(x=abs(rnorm(50)),id1=rep(1:5,10), id2=rep(1:2,25))
# make sure piping and grouping are called correctly, with "sum" function as a well-behaving example function
df %>% group_by(id1) %>% summarise(s=sum(x))
df %>% group_by(id1,id2) %>% summarise(s=sum(x))
# make sure wilcox.test is called correctly
wilcox.test(x~id2, data=df, paired=FALSE)$p.value
# yet, cannot call wilcox.test within pipe with summarise (regardless of group_by). Expected output is five p-values (one for each level of id1)
df %>% group_by(id1) %>% summarise(w=wilcox.test(x~id2, data=., paired=FALSE)$p.value)
df %>% summarise(wilcox.test(x~id2, data=., paired=FALSE))
# even specifying formula argument by name doesn't help
df %>% group_by(id1) %>% summarise(w=wilcox.test(formula=x~id2, data=., paired=FALSE)$p.value)
越野车调用产生此错误:
Error in wilcox.test.formula(c(1.09057358373486,
2.28465932554436, 0.885617572657959, : 'formula' missing or incorrect
感谢您的帮助;希望对其他有类似问题的人也有帮助。
你可以用 base R 做这个(虽然结果是一个繁琐的列表):
by(df, df$id1, function(x) { wilcox.test(x~id2, data=x, paired=FALSE)$p.value })
或使用 dplyr:
ddply(df, .(id1), function(x) { wilcox.test(x~id2, data=x, paired=FALSE)$p.value })
id1 V1
1 1 0.3095238
2 2 1.0000000
3 3 0.8412698
4 4 0.6904762
5 5 0.3095238
使用 do 函数(加载 dplyr 库后调用 ?do)可以轻松完成您的任务。使用您的数据,链将如下所示:
df <- data.frame(x=abs(rnorm(50)),id1=rep(1:5,10), id2=rep(1:2,25))
df <- tbl_df(df)
res <- df %>% group_by(id1) %>%
do(w = wilcox.test(x~id2, data=., paired=FALSE)) %>%
summarise(id1, Wilcox = w$p.value)
输出
res
Source: local data frame [5 x 2]
id1 Wilcox
(int) (dbl)
1 1 0.6904762
2 2 0.4206349
3 3 1.0000000
4 4 0.6904762
5 5 1.0000000
注意我在 group_by 和 summarize 之间添加了 do 函数.
希望对你有帮助。
必须有一种 R-ly 方法可以使用 group_by 对多个观察结果并行调用 wilcox.test
。我花了很多时间阅读这篇文章,但仍然无法找到对 wilcox.test
的调用来完成这项工作。下面的示例数据和代码,使用 magrittr
管道和 summarize()
。
library(dplyr)
library(magrittr)
# create a data frame where x is the dependent variable, id1 is a category variable (here with five levels), and id2 is a binary category variable used for the two-sample wilcoxon test
df <- data.frame(x=abs(rnorm(50)),id1=rep(1:5,10), id2=rep(1:2,25))
# make sure piping and grouping are called correctly, with "sum" function as a well-behaving example function
df %>% group_by(id1) %>% summarise(s=sum(x))
df %>% group_by(id1,id2) %>% summarise(s=sum(x))
# make sure wilcox.test is called correctly
wilcox.test(x~id2, data=df, paired=FALSE)$p.value
# yet, cannot call wilcox.test within pipe with summarise (regardless of group_by). Expected output is five p-values (one for each level of id1)
df %>% group_by(id1) %>% summarise(w=wilcox.test(x~id2, data=., paired=FALSE)$p.value)
df %>% summarise(wilcox.test(x~id2, data=., paired=FALSE))
# even specifying formula argument by name doesn't help
df %>% group_by(id1) %>% summarise(w=wilcox.test(formula=x~id2, data=., paired=FALSE)$p.value)
越野车调用产生此错误:
Error in wilcox.test.formula(c(1.09057358373486,
2.28465932554436, 0.885617572657959, : 'formula' missing or incorrect
感谢您的帮助;希望对其他有类似问题的人也有帮助。
你可以用 base R 做这个(虽然结果是一个繁琐的列表):
by(df, df$id1, function(x) { wilcox.test(x~id2, data=x, paired=FALSE)$p.value })
或使用 dplyr:
ddply(df, .(id1), function(x) { wilcox.test(x~id2, data=x, paired=FALSE)$p.value })
id1 V1
1 1 0.3095238
2 2 1.0000000
3 3 0.8412698
4 4 0.6904762
5 5 0.3095238
使用 do 函数(加载 dplyr 库后调用 ?do)可以轻松完成您的任务。使用您的数据,链将如下所示:
df <- data.frame(x=abs(rnorm(50)),id1=rep(1:5,10), id2=rep(1:2,25))
df <- tbl_df(df)
res <- df %>% group_by(id1) %>%
do(w = wilcox.test(x~id2, data=., paired=FALSE)) %>%
summarise(id1, Wilcox = w$p.value)
输出
res
Source: local data frame [5 x 2]
id1 Wilcox
(int) (dbl)
1 1 0.6904762
2 2 0.4206349
3 3 1.0000000
4 4 0.6904762
5 5 1.0000000
注意我在 group_by 和 summarize 之间添加了 do 函数.
希望对你有帮助。