分组的 Tidyverse 解决方案，并通过数据帧循环执行 dunn.test

Question

我在下面有一个示例数据框。

example.df <- data.frame( 
species = sample(c("primate", "non-primate"), 50, replace = TRUE),
treated = sample(c("Yes", "No"), 50, replace = TRUE), 
gender = sample(c("male", "female"), 50, replace = TRUE), 
var1 = rnorm(50, 100, 5), resp=rnorm(50, 10,5), value1 = rnorm (50, 25, 5))

我想先按 treated 分组，然后循环遍历数据中的所有数值变量以使用 species 作为解释变量，提取 summary 数据框对象并将 yes 和 no 子列表中的列绑定到新的数据框对象中。

我已经使用部分 "tidy" 方法获得了部分解决方案 and ，效果很好。我只是在寻找一个更优雅的 tidyverse 解决方案，以帮助我学习成为更好的 R 用户。

感谢任何帮助。

编辑：这是我从部分 "tidy" 解决方案得到的输出。

structure(list(var1.Diff = structure(1:2, .Label = c("-7.05229", 
"-2.25"), class = "factor"), var1.Lower = structure(1:2, .Label = 
c("-13.23198", 
"-8.25114"), class = "factor"), var1.Upper = structure(1:2, .Label = 
c("-0.87259", 
"3.75114"), class = "factor"), var1.Decision = structure(1:2, .Label = 
 c("Reject H0", 
 "FTR H0"), class = "factor"), var1.Adj..P.value = structure(1:2, .Label = 
 c("0.025305", 
 "0.462433"), class = "factor"), resp.Diff = structure(1:2, .Label = 
 c("1.10458", 
 "0"), class = "factor"), resp.Lower = structure(1:2, .Label = c("-5.07512", 
 "-6.00114"), class = "factor"), resp.Upper = structure(1:2, .Label = 
 c("7.28427", 
 "6.00114"), class = "factor"), resp.Decision = structure(c(1L, 
 1L), .Label = "FTR H0", class = "factor"), resp.Adj..P.value = 
 structure(1:2, .Label = c("0.726092", 
 "1"), class = "factor"), effect.Diff = structure(1:2, .Label = 
 c("-1.27451", 
 "-0.5625"), class = "factor"), effect.Lower = structure(1:2, .Label = 
 c("-7.4542", 
 "-6.56364"), class = "factor"), effect.Upper = structure(1:2, .Label = 
 c("4.90518", 
 "5.43864"), class = "factor"), effect.Decision = structure(c(1L, 
 1L), .Label = "FTR H0", class = "factor"), effect.Adj..P.value = 
 structure(1:2, .Label = c("0.686047", 
 "0.85424"), class = "factor")), row.names = c("No", "Yes"), class = 
 "data.frame")

Answer 1

更新 2022/03/16

tidyverse 已经发展，这个解决方案也应该如此。

library("asbio")
#> Loading required package: tcltk
library("tidyverse")

# It is good practice to set the seed before simulating random data
# Otherwise we can't compare
set.seed(12345)

example.df <-
  tibble(
    species = sample(c("primate", "non-primate"), 50, replace = TRUE),
    treated = sample(c("Yes", "No"), 50, replace = TRUE),
    gender = sample(c("male", "female"), 50, replace = TRUE),
    var1 = rnorm(50, 100, 5), resp = rnorm(50, 10, 5), value1 = rnorm(50, 25, 5)
  ) %>%
  # Make sure species is a factor as required by `pair.kw`
  mutate(
    species = factor(species)
  )

example.df %>%
  # We want to perform the test separately for each group
  group_by(
    treated
  ) %>%
  group_modify(
    # Perform test and extract summary
    ~ pairw.kw(.x$var1, .x$species, conf = 0.95)$summary
  )
#> # A tibble: 2 × 6
#> # Groups:   treated [2]
#>   treated Diff     Lower    Upper    Decision `Adj. P-value`
#>   <chr>   <chr>    <chr>    <chr>    <chr>    <chr>         
#> 1 No      -2.67949 -8.12299 2.76402  FTR H0   0.334663      
#> 2 Yes     4.07071  -2.98047 11.12188 FTR H0   0.257843

将这种方法扩展到运行 6 个测试很简单，每个测试组和一个响应变量（var1、value1 或 resp).例如，我们可以将 tibble 从宽格式（三个响应列）转换为窄格式（三个响应按行堆叠），然后像上面一样进行。

responses <- c("value1", "var1", "resp")

example.df %>%
  pivot_longer(
    all_of(responses),
    names_to = "variable"
  ) %>%
  group_by(
    treated, variable
  ) %>%
  group_modify(
    ~ pairw.kw(.x$value, .x$species, conf = 0.95)$summary
  )
#> # A tibble: 6 × 7
#> # Groups:   treated, variable [6]
#>   treated variable Diff     Lower     Upper    Decision `Adj. P-value`
#>   <chr>   <chr>    <chr>    <chr>     <chr>    <chr>    <chr>         
#> 1 No      resp     -1.46154 -6.90505  3.98197  FTR H0   0.598725      
#> 2 No      value1   4.62821  -0.8153   10.07171 FTR H0   0.095632      
#> 3 No      var1     -2.67949 -8.12299  2.76402  FTR H0   0.334663      
#> 4 Yes     resp     3.75758  -3.2936   10.80875 FTR H0   0.29627       
#> 5 Yes     value1   -2.97475 -10.02592 4.07643  FTR H0   0.408311      
#> 6 Yes     var1     4.07071  -2.98047  11.12188 FTR H0   0.257843

^{由 reprex package (v2.0.1)}

于 2022 年 3 月 16 日创建

旧解

这是一个 tidy 方法，可以同时运行进行多项测试。

library("asbio")
#> Loading required package: tcltk
library("tidyverse")

# It is good practice to set the seed before simulating random data
# Otherwise can't compare
set.seed(12345)

example.df <- tibble(
  species = sample(c("primate", "non-primate"), 50, replace = TRUE),
  treated = sample(c("Yes", "No"), 50, replace = TRUE),
  gender = sample(c("male", "female"), 50, replace = TRUE),
  var1 = rnorm(50, 100, 5), resp=rnorm(50, 10,5), value1 = rnorm (50, 25, 5)) %>%
  # Make sure species is a factor as required by `pair.kw`
  mutate(species = factor(species))

example.df %>%
  # Nest the data for each treatment group
  nest(- treated) %>%
  # Run the test on each treatment group
  mutate(test = map(data, ~ pairw.kw(.$var1, .$species, conf = 0.95))) %>%
  # There is no broom::tidy method for objects of class pairw
  # but we can extract the summary ourselves
  mutate(summary = map(test, pluck, "summary")) %>%
  # Cast all the factor columns in the summary table to character, to
  # avoid a warning about converting factors to characters.
  mutate(summary = map(summary, mutate_if, is.factor, as.character)) %>%
  unnest(summary)
#> # A tibble: 2 x 8
#>   treated data        test     Diff    Lower  Upper Decision `Adj. P-value`
#>   <chr>   <list>      <list>   <chr>   <chr>  <chr> <chr>    <chr>         
#> 1 No      <tibble [2~ <S3: pa~ -1.705~ -7.99~ 4.58~ FTR H0   0.595163      
#> 2 Yes     <tibble [2~ <S3: pa~ -1.145~ -6.45~ 4.16~ FTR H0   0.672655

将这种方法扩展到运行 6 个测试很简单，每个测试组和一个响应变量（var1、value1 或 resp).例如，我们可以将 tibble 从宽格式（三个响应列）转换为窄格式（三个响应按行堆叠），然后像上面一样进行。

example.df %>%
  # From wide format to narrow format
  gather(varname, y, value1, var1, resp) %>%
  # Nest the data for each treatment group and each variable
  nest(- treated, - varname) %>%
  # Run 6 tests = (number of treatments) * (number of response variables)
  mutate(test = map(data, ~ pairw.kw(.$y, .$species, conf = 0.95))) %>%
  # There is no broom::tidy method for objects of class pairw
  # but we can extract the summary ourselves
  mutate(summary = map(test, pluck, "summary")) %>%
  # Cast all the factor columns in the summary table to character, to
  # avoid a warning about converting factors to characters.
  mutate(summary = map(summary, mutate_if, is.factor, as.character)) %>%
  unnest(summary)
#> # A tibble: 6 x 9
#>   treated varname data    test   Diff   Lower Upper Decision `Adj. P-value`
#>   <chr>   <chr>   <list>  <list> <chr>  <chr> <chr> <chr>    <chr>         
#> 1 No      value1  <tibbl~ <S3: ~ 3.127~ -3.1~ 9.41~ FTR H0   0.329969      
#> 2 Yes     value1  <tibbl~ <S3: ~ -1.33~ -6.65 3.97~ FTR H0   0.622065      
#> 3 No      var1    <tibbl~ <S3: ~ -1.70~ -7.9~ 4.58~ FTR H0   0.595163      
#> 4 Yes     var1    <tibbl~ <S3: ~ -1.14~ -6.4~ 4.16~ FTR H0   0.672655      
#> 5 No      resp    <tibbl~ <S3: ~ 1.421~ -4.8~ 7.71~ FTR H0   0.657905      
#> 6 Yes     resp    <tibbl~ <S3: ~ 1.145~ -4.1~ 6.45~ FTR H0   0.672655

^{由 reprex package (v0.2.1)}

于 2019-03-04 创建

如果您想灵活地设置回复的数量和名称怎么办？然后指定响应列表：

responses <- c("value1", "var1", "resp")

并在 gather 语句中使用 tidy evaluation 如下：

example.df %>%
  # From wide format to narrow format, with tidy evaluation
  gather(varname, y, !!!rlang::syms(responses))
  # Rest of computation here.....

分组的 Tidyverse 解决方案，并通过数据帧循环执行 dunn.test

Tidyverse solution to grouping, and looping through dataframe to perform dunn.test

r

asbio

dunn.test

tidyverse