如何从 purrr::map2 得到 "tidy" 结果?
How can I get a "tidy" result from purrr::map2?
给定一个包含不同变量的两个重复测量值的数据框(即 A1, A2, B1, B2
)
library(purrr)
library(tidyr)
library(broom)
set.seed(123)
my_df = data.frame(matrix(rnorm(80), nrow=10))
colnames(my_df) <- c("A1_BEFORE", "A1_AFTER", "A2_BEFORE", "A2_AFTER",
"B1_BEFORE", "B1_AFTER", "B2_BEFORE", "B2_AFTER")
如何使用函数式编程原则迭代相同变量对(之前、之后)并获得 "tidy" 结果?这是我的尝试:
bef <- select(my_df, contains("BEFORE"))
aft <- select(my_df, contains("AFTER"))
result <- map2(bef, aft, t.test, paired = T)
以上结果产生了多个嵌套列表。我怎样才能获得 "tidy" 结果?
result <- tidy(map2(bef, aft, t.test, paired = T))
result <- tidy(map2(bef, aft, t.test, paired = T))
Error in tidy.list(map2(bef, aft, t.test, paired = T)) :
No tidying method recognized for this list
In addition: Warning message:
In sort(names(x)) == c("d", "u", "v") :
longer object length is not a multiple of shorter object length
我们可以使用 map_df
因为它是 list
map2(bef, aft, t.test, paired = TRUE) %>%
map_df(tidy)
# estimate statistic p.value parameter conf.low conf.high method
#1 -0.1339963 -0.4613684 0.65548187 9 -0.7909999 0.5230073 Paired t-test
#2 -0.7466034 -1.8820475 0.09250351 9 -1.6439954 0.1507885 Paired t-test
#3 -0.2304015 -0.5740849 0.57997286 9 -1.1382891 0.6774860 Paired t-test
#4 0.4860015 1.3468795 0.21095133 9 -0.3302644 1.3022674 Paired t-test
# alternative
#1 two.sided
#2 two.sided
#3 two.sided
#4 two.sided
或更紧凑
map2_df(bef, aft, ~tidy(t.test(.x, .y, paired = TRUE)))
这是另一种方法,在进行 t 检验之前整理数据。显然得到了相同的结果,但是这种方法在最终输出中标记了被测试的变量。
仅更改数据 - 添加了一个 id 变量来索引重复的测量值
除 dplyr
外还需要 broom
和 tidyr
library(tidyr, dplyr, broom)
使用tidyr
重组
my_tidy_df <- my_df %>%
mutate(id = row_number()) %>% # needs an id to group repeated measure
gather(var, value, -id) %>%
extract(var, c("var", "timepoint"), "([[:alnum:]]+)_([[:alnum:]]+)") %>%
spread(timepoint, value)
给出了这个结构
id var AFTER BEFORE
1 1 A1 -1.14854253 -0.9032172
2 1 A2 2.36114529 -0.6500869
3 1 B1 0.26204456 -0.5477532
4 1 B2 -1.34416890 -0.4696884
5 2 A1 0.53400345 1.2722203
然后您可以 运行 每个变量的 t 检验如下:
my_tidy_df %>%
group_by(var) %>%
do(broom::tidy(t.test(.$BEFORE, .$AFTER, data=., paired=T)))
结果:
# Groups: var [4]
var estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <fctr>
1 A1 0.16014628 0.3470400 0.7365381 9 -0.8837567 1.2040493 Paired t-test two.sided
2 A2 -0.99798993 -1.6271640 0.1381451 9 -2.3854407 0.3894609 Paired t-test two.sided
3 B1 0.04916586 0.1289803 0.9002097 9 -0.8131436 0.9114753 Paired t-test two.sided
4 B2 -0.06919212 -0.1833619 0.8585784 9 -0.9228233 0.7844391 Paired t-test two.sided
给定一个包含不同变量的两个重复测量值的数据框(即 A1, A2, B1, B2
)
library(purrr)
library(tidyr)
library(broom)
set.seed(123)
my_df = data.frame(matrix(rnorm(80), nrow=10))
colnames(my_df) <- c("A1_BEFORE", "A1_AFTER", "A2_BEFORE", "A2_AFTER",
"B1_BEFORE", "B1_AFTER", "B2_BEFORE", "B2_AFTER")
如何使用函数式编程原则迭代相同变量对(之前、之后)并获得 "tidy" 结果?这是我的尝试:
bef <- select(my_df, contains("BEFORE"))
aft <- select(my_df, contains("AFTER"))
result <- map2(bef, aft, t.test, paired = T)
以上结果产生了多个嵌套列表。我怎样才能获得 "tidy" 结果?
result <- tidy(map2(bef, aft, t.test, paired = T))
result <- tidy(map2(bef, aft, t.test, paired = T))
Error in tidy.list(map2(bef, aft, t.test, paired = T)) : No tidying method recognized for this list In addition: Warning message: In sort(names(x)) == c("d", "u", "v") : longer object length is not a multiple of shorter object length
我们可以使用 map_df
因为它是 list
map2(bef, aft, t.test, paired = TRUE) %>%
map_df(tidy)
# estimate statistic p.value parameter conf.low conf.high method
#1 -0.1339963 -0.4613684 0.65548187 9 -0.7909999 0.5230073 Paired t-test
#2 -0.7466034 -1.8820475 0.09250351 9 -1.6439954 0.1507885 Paired t-test
#3 -0.2304015 -0.5740849 0.57997286 9 -1.1382891 0.6774860 Paired t-test
#4 0.4860015 1.3468795 0.21095133 9 -0.3302644 1.3022674 Paired t-test
# alternative
#1 two.sided
#2 two.sided
#3 two.sided
#4 two.sided
或更紧凑
map2_df(bef, aft, ~tidy(t.test(.x, .y, paired = TRUE)))
这是另一种方法,在进行 t 检验之前整理数据。显然得到了相同的结果,但是这种方法在最终输出中标记了被测试的变量。
仅更改数据 - 添加了一个 id 变量来索引重复的测量值
除 dplyr
broom
和 tidyr
library(tidyr, dplyr, broom)
使用tidyr
重组
my_tidy_df <- my_df %>%
mutate(id = row_number()) %>% # needs an id to group repeated measure
gather(var, value, -id) %>%
extract(var, c("var", "timepoint"), "([[:alnum:]]+)_([[:alnum:]]+)") %>%
spread(timepoint, value)
给出了这个结构
id var AFTER BEFORE
1 1 A1 -1.14854253 -0.9032172
2 1 A2 2.36114529 -0.6500869
3 1 B1 0.26204456 -0.5477532
4 1 B2 -1.34416890 -0.4696884
5 2 A1 0.53400345 1.2722203
然后您可以 运行 每个变量的 t 检验如下:
my_tidy_df %>%
group_by(var) %>%
do(broom::tidy(t.test(.$BEFORE, .$AFTER, data=., paired=T)))
结果:
# Groups: var [4]
var estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <fctr>
1 A1 0.16014628 0.3470400 0.7365381 9 -0.8837567 1.2040493 Paired t-test two.sided
2 A2 -0.99798993 -1.6271640 0.1381451 9 -2.3854407 0.3894609 Paired t-test two.sided
3 B1 0.04916586 0.1289803 0.9002097 9 -0.8131436 0.9114753 Paired t-test two.sided
4 B2 -0.06919212 -0.1833619 0.8585784 9 -0.9228233 0.7844391 Paired t-test two.sided