分组的 Tidyverse 解决方案,并通过数据帧循环执行 dunn.test
Tidyverse solution to grouping, and looping through dataframe to perform dunn.test
我在下面有一个示例数据框。
example.df <- data.frame(
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5), resp=rnorm(50, 10,5), value1 = rnorm (50, 25, 5))
我想先按 treated
分组,然后循环遍历数据中的所有数值变量以使用 species
作为解释变量,提取 summary
数据框对象并将 yes
和 no
子列表中的列绑定到新的数据框对象中。
我已经使用部分 "tidy" 方法获得了部分解决方案 and ,效果很好。我只是在寻找一个更优雅的 tidyverse 解决方案,以帮助我学习成为更好的 R 用户。
感谢任何帮助。
编辑:这是我从 部分 "tidy" 解决方案得到的输出。
structure(list(var1.Diff = structure(1:2, .Label = c("-7.05229",
"-2.25"), class = "factor"), var1.Lower = structure(1:2, .Label =
c("-13.23198",
"-8.25114"), class = "factor"), var1.Upper = structure(1:2, .Label =
c("-0.87259",
"3.75114"), class = "factor"), var1.Decision = structure(1:2, .Label =
c("Reject H0",
"FTR H0"), class = "factor"), var1.Adj..P.value = structure(1:2, .Label =
c("0.025305",
"0.462433"), class = "factor"), resp.Diff = structure(1:2, .Label =
c("1.10458",
"0"), class = "factor"), resp.Lower = structure(1:2, .Label = c("-5.07512",
"-6.00114"), class = "factor"), resp.Upper = structure(1:2, .Label =
c("7.28427",
"6.00114"), class = "factor"), resp.Decision = structure(c(1L,
1L), .Label = "FTR H0", class = "factor"), resp.Adj..P.value =
structure(1:2, .Label = c("0.726092",
"1"), class = "factor"), effect.Diff = structure(1:2, .Label =
c("-1.27451",
"-0.5625"), class = "factor"), effect.Lower = structure(1:2, .Label =
c("-7.4542",
"-6.56364"), class = "factor"), effect.Upper = structure(1:2, .Label =
c("4.90518",
"5.43864"), class = "factor"), effect.Decision = structure(c(1L,
1L), .Label = "FTR H0", class = "factor"), effect.Adj..P.value =
structure(1:2, .Label = c("0.686047",
"0.85424"), class = "factor")), row.names = c("No", "Yes"), class =
"data.frame")
更新 2022/03/16
tidyverse 已经发展,这个解决方案也应该如此。
library("asbio")
#> Loading required package: tcltk
library("tidyverse")
# It is good practice to set the seed before simulating random data
# Otherwise we can't compare
set.seed(12345)
example.df <-
tibble(
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5), resp = rnorm(50, 10, 5), value1 = rnorm(50, 25, 5)
) %>%
# Make sure species is a factor as required by `pair.kw`
mutate(
species = factor(species)
)
example.df %>%
# We want to perform the test separately for each group
group_by(
treated
) %>%
group_modify(
# Perform test and extract summary
~ pairw.kw(.x$var1, .x$species, conf = 0.95)$summary
)
#> # A tibble: 2 × 6
#> # Groups: treated [2]
#> treated Diff Lower Upper Decision `Adj. P-value`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 No -2.67949 -8.12299 2.76402 FTR H0 0.334663
#> 2 Yes 4.07071 -2.98047 11.12188 FTR H0 0.257843
将这种方法扩展到 运行 6 个测试很简单,每个测试组和一个响应变量(var1
、value1
或 resp
).例如,我们可以将 tibble 从宽格式(三个响应列)转换为窄格式(三个响应按行堆叠),然后像上面一样进行。
responses <- c("value1", "var1", "resp")
example.df %>%
pivot_longer(
all_of(responses),
names_to = "variable"
) %>%
group_by(
treated, variable
) %>%
group_modify(
~ pairw.kw(.x$value, .x$species, conf = 0.95)$summary
)
#> # A tibble: 6 × 7
#> # Groups: treated, variable [6]
#> treated variable Diff Lower Upper Decision `Adj. P-value`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 No resp -1.46154 -6.90505 3.98197 FTR H0 0.598725
#> 2 No value1 4.62821 -0.8153 10.07171 FTR H0 0.095632
#> 3 No var1 -2.67949 -8.12299 2.76402 FTR H0 0.334663
#> 4 Yes resp 3.75758 -3.2936 10.80875 FTR H0 0.29627
#> 5 Yes value1 -2.97475 -10.02592 4.07643 FTR H0 0.408311
#> 6 Yes var1 4.07071 -2.98047 11.12188 FTR H0 0.257843
由 reprex package (v2.0.1)
于 2022 年 3 月 16 日创建
旧解
这是一个 tidy 方法,可以同时 运行 进行多项测试。
library("asbio")
#> Loading required package: tcltk
library("tidyverse")
# It is good practice to set the seed before simulating random data
# Otherwise can't compare
set.seed(12345)
example.df <- tibble(
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5), resp=rnorm(50, 10,5), value1 = rnorm (50, 25, 5)) %>%
# Make sure species is a factor as required by `pair.kw`
mutate(species = factor(species))
example.df %>%
# Nest the data for each treatment group
nest(- treated) %>%
# Run the test on each treatment group
mutate(test = map(data, ~ pairw.kw(.$var1, .$species, conf = 0.95))) %>%
# There is no broom::tidy method for objects of class pairw
# but we can extract the summary ourselves
mutate(summary = map(test, pluck, "summary")) %>%
# Cast all the factor columns in the summary table to character, to
# avoid a warning about converting factors to characters.
mutate(summary = map(summary, mutate_if, is.factor, as.character)) %>%
unnest(summary)
#> # A tibble: 2 x 8
#> treated data test Diff Lower Upper Decision `Adj. P-value`
#> <chr> <list> <list> <chr> <chr> <chr> <chr> <chr>
#> 1 No <tibble [2~ <S3: pa~ -1.705~ -7.99~ 4.58~ FTR H0 0.595163
#> 2 Yes <tibble [2~ <S3: pa~ -1.145~ -6.45~ 4.16~ FTR H0 0.672655
将这种方法扩展到 运行 6 个测试很简单,每个测试组和一个响应变量(var1
、value1
或 resp
).例如,我们可以将 tibble 从宽格式(三个响应列)转换为窄格式(三个响应按行堆叠),然后像上面一样进行。
example.df %>%
# From wide format to narrow format
gather(varname, y, value1, var1, resp) %>%
# Nest the data for each treatment group and each variable
nest(- treated, - varname) %>%
# Run 6 tests = (number of treatments) * (number of response variables)
mutate(test = map(data, ~ pairw.kw(.$y, .$species, conf = 0.95))) %>%
# There is no broom::tidy method for objects of class pairw
# but we can extract the summary ourselves
mutate(summary = map(test, pluck, "summary")) %>%
# Cast all the factor columns in the summary table to character, to
# avoid a warning about converting factors to characters.
mutate(summary = map(summary, mutate_if, is.factor, as.character)) %>%
unnest(summary)
#> # A tibble: 6 x 9
#> treated varname data test Diff Lower Upper Decision `Adj. P-value`
#> <chr> <chr> <list> <list> <chr> <chr> <chr> <chr> <chr>
#> 1 No value1 <tibbl~ <S3: ~ 3.127~ -3.1~ 9.41~ FTR H0 0.329969
#> 2 Yes value1 <tibbl~ <S3: ~ -1.33~ -6.65 3.97~ FTR H0 0.622065
#> 3 No var1 <tibbl~ <S3: ~ -1.70~ -7.9~ 4.58~ FTR H0 0.595163
#> 4 Yes var1 <tibbl~ <S3: ~ -1.14~ -6.4~ 4.16~ FTR H0 0.672655
#> 5 No resp <tibbl~ <S3: ~ 1.421~ -4.8~ 7.71~ FTR H0 0.657905
#> 6 Yes resp <tibbl~ <S3: ~ 1.145~ -4.1~ 6.45~ FTR H0 0.672655
由 reprex package (v0.2.1)
于 2019-03-04 创建
如果您想灵活地设置回复的数量和名称怎么办?然后指定响应列表:
responses <- c("value1", "var1", "resp")
并在 gather
语句中使用 tidy evaluation 如下:
example.df %>%
# From wide format to narrow format, with tidy evaluation
gather(varname, y, !!!rlang::syms(responses))
# Rest of computation here.....
我在下面有一个示例数据框。
example.df <- data.frame(
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5), resp=rnorm(50, 10,5), value1 = rnorm (50, 25, 5))
我想先按 treated
分组,然后循环遍历数据中的所有数值变量以使用 species
作为解释变量,提取 summary
数据框对象并将 yes
和 no
子列表中的列绑定到新的数据框对象中。
我已经使用部分 "tidy" 方法获得了部分解决方案
感谢任何帮助。
编辑:这是我从
structure(list(var1.Diff = structure(1:2, .Label = c("-7.05229",
"-2.25"), class = "factor"), var1.Lower = structure(1:2, .Label =
c("-13.23198",
"-8.25114"), class = "factor"), var1.Upper = structure(1:2, .Label =
c("-0.87259",
"3.75114"), class = "factor"), var1.Decision = structure(1:2, .Label =
c("Reject H0",
"FTR H0"), class = "factor"), var1.Adj..P.value = structure(1:2, .Label =
c("0.025305",
"0.462433"), class = "factor"), resp.Diff = structure(1:2, .Label =
c("1.10458",
"0"), class = "factor"), resp.Lower = structure(1:2, .Label = c("-5.07512",
"-6.00114"), class = "factor"), resp.Upper = structure(1:2, .Label =
c("7.28427",
"6.00114"), class = "factor"), resp.Decision = structure(c(1L,
1L), .Label = "FTR H0", class = "factor"), resp.Adj..P.value =
structure(1:2, .Label = c("0.726092",
"1"), class = "factor"), effect.Diff = structure(1:2, .Label =
c("-1.27451",
"-0.5625"), class = "factor"), effect.Lower = structure(1:2, .Label =
c("-7.4542",
"-6.56364"), class = "factor"), effect.Upper = structure(1:2, .Label =
c("4.90518",
"5.43864"), class = "factor"), effect.Decision = structure(c(1L,
1L), .Label = "FTR H0", class = "factor"), effect.Adj..P.value =
structure(1:2, .Label = c("0.686047",
"0.85424"), class = "factor")), row.names = c("No", "Yes"), class =
"data.frame")
更新 2022/03/16
tidyverse 已经发展,这个解决方案也应该如此。
library("asbio")
#> Loading required package: tcltk
library("tidyverse")
# It is good practice to set the seed before simulating random data
# Otherwise we can't compare
set.seed(12345)
example.df <-
tibble(
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5), resp = rnorm(50, 10, 5), value1 = rnorm(50, 25, 5)
) %>%
# Make sure species is a factor as required by `pair.kw`
mutate(
species = factor(species)
)
example.df %>%
# We want to perform the test separately for each group
group_by(
treated
) %>%
group_modify(
# Perform test and extract summary
~ pairw.kw(.x$var1, .x$species, conf = 0.95)$summary
)
#> # A tibble: 2 × 6
#> # Groups: treated [2]
#> treated Diff Lower Upper Decision `Adj. P-value`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 No -2.67949 -8.12299 2.76402 FTR H0 0.334663
#> 2 Yes 4.07071 -2.98047 11.12188 FTR H0 0.257843
将这种方法扩展到 运行 6 个测试很简单,每个测试组和一个响应变量(var1
、value1
或 resp
).例如,我们可以将 tibble 从宽格式(三个响应列)转换为窄格式(三个响应按行堆叠),然后像上面一样进行。
responses <- c("value1", "var1", "resp")
example.df %>%
pivot_longer(
all_of(responses),
names_to = "variable"
) %>%
group_by(
treated, variable
) %>%
group_modify(
~ pairw.kw(.x$value, .x$species, conf = 0.95)$summary
)
#> # A tibble: 6 × 7
#> # Groups: treated, variable [6]
#> treated variable Diff Lower Upper Decision `Adj. P-value`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 No resp -1.46154 -6.90505 3.98197 FTR H0 0.598725
#> 2 No value1 4.62821 -0.8153 10.07171 FTR H0 0.095632
#> 3 No var1 -2.67949 -8.12299 2.76402 FTR H0 0.334663
#> 4 Yes resp 3.75758 -3.2936 10.80875 FTR H0 0.29627
#> 5 Yes value1 -2.97475 -10.02592 4.07643 FTR H0 0.408311
#> 6 Yes var1 4.07071 -2.98047 11.12188 FTR H0 0.257843
由 reprex package (v2.0.1)
于 2022 年 3 月 16 日创建旧解
这是一个 tidy 方法,可以同时 运行 进行多项测试。
library("asbio")
#> Loading required package: tcltk
library("tidyverse")
# It is good practice to set the seed before simulating random data
# Otherwise can't compare
set.seed(12345)
example.df <- tibble(
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE),
gender = sample(c("male", "female"), 50, replace = TRUE),
var1 = rnorm(50, 100, 5), resp=rnorm(50, 10,5), value1 = rnorm (50, 25, 5)) %>%
# Make sure species is a factor as required by `pair.kw`
mutate(species = factor(species))
example.df %>%
# Nest the data for each treatment group
nest(- treated) %>%
# Run the test on each treatment group
mutate(test = map(data, ~ pairw.kw(.$var1, .$species, conf = 0.95))) %>%
# There is no broom::tidy method for objects of class pairw
# but we can extract the summary ourselves
mutate(summary = map(test, pluck, "summary")) %>%
# Cast all the factor columns in the summary table to character, to
# avoid a warning about converting factors to characters.
mutate(summary = map(summary, mutate_if, is.factor, as.character)) %>%
unnest(summary)
#> # A tibble: 2 x 8
#> treated data test Diff Lower Upper Decision `Adj. P-value`
#> <chr> <list> <list> <chr> <chr> <chr> <chr> <chr>
#> 1 No <tibble [2~ <S3: pa~ -1.705~ -7.99~ 4.58~ FTR H0 0.595163
#> 2 Yes <tibble [2~ <S3: pa~ -1.145~ -6.45~ 4.16~ FTR H0 0.672655
将这种方法扩展到 运行 6 个测试很简单,每个测试组和一个响应变量(var1
、value1
或 resp
).例如,我们可以将 tibble 从宽格式(三个响应列)转换为窄格式(三个响应按行堆叠),然后像上面一样进行。
example.df %>%
# From wide format to narrow format
gather(varname, y, value1, var1, resp) %>%
# Nest the data for each treatment group and each variable
nest(- treated, - varname) %>%
# Run 6 tests = (number of treatments) * (number of response variables)
mutate(test = map(data, ~ pairw.kw(.$y, .$species, conf = 0.95))) %>%
# There is no broom::tidy method for objects of class pairw
# but we can extract the summary ourselves
mutate(summary = map(test, pluck, "summary")) %>%
# Cast all the factor columns in the summary table to character, to
# avoid a warning about converting factors to characters.
mutate(summary = map(summary, mutate_if, is.factor, as.character)) %>%
unnest(summary)
#> # A tibble: 6 x 9
#> treated varname data test Diff Lower Upper Decision `Adj. P-value`
#> <chr> <chr> <list> <list> <chr> <chr> <chr> <chr> <chr>
#> 1 No value1 <tibbl~ <S3: ~ 3.127~ -3.1~ 9.41~ FTR H0 0.329969
#> 2 Yes value1 <tibbl~ <S3: ~ -1.33~ -6.65 3.97~ FTR H0 0.622065
#> 3 No var1 <tibbl~ <S3: ~ -1.70~ -7.9~ 4.58~ FTR H0 0.595163
#> 4 Yes var1 <tibbl~ <S3: ~ -1.14~ -6.4~ 4.16~ FTR H0 0.672655
#> 5 No resp <tibbl~ <S3: ~ 1.421~ -4.8~ 7.71~ FTR H0 0.657905
#> 6 Yes resp <tibbl~ <S3: ~ 1.145~ -4.1~ 6.45~ FTR H0 0.672655
由 reprex package (v0.2.1)
于 2019-03-04 创建如果您想灵活地设置回复的数量和名称怎么办?然后指定响应列表:
responses <- c("value1", "var1", "resp")
并在 gather
语句中使用 tidy evaluation 如下:
example.df %>%
# From wide format to narrow format, with tidy evaluation
gather(varname, y, !!!rlang::syms(responses))
# Rest of computation here.....