使用 Tidy and Do 时处理多个 T 检验的错误

Question

我一直在使用函数 do 和 tidy 对提前按类别分组的数据框执行多个 t 检验。然而，数据框中的一些值是恒定的，我通常必须过滤掉这些值才能使测试工作，因为它 returns Error in t.test.default(.$y) : data are essentially constant。我正在寻找一种方法来像往常一样使用 do 和 tidy 来执行 t 检验，但不是过滤我们的常量类别，而是使用估计列使用该类别的值，其他类别为 NA。

示例数据框：

trial<-data.frame(
  type=rep(c("A","B","C"),times=2,length.out=10),
  y=c(1,2,3,1,3,2,1,2,3,1)
)

数据框：

过滤后的 T 检验：

trial.ttest<-trial %>% 
  group_by(type) %>% 
  filter(!type=="A") %>% 
  do(tidy(t.test(.$y))))

过滤后的 T 检验结果：

  type estimate statistic    p.value parameter  conf.low
1    B 2.333333         7 0.01980394         2 0.8991158
2    C 2.666667         8 0.01526807         2 1.2324491
  conf.high            method alternative
1  3.767551 One Sample t-test   two.sided
2  4.100884 One Sample t-test   two.sided

我尝试使用以下代码，其中使用 trycatch 和 tribble 来执行此操作，但我最终得到了 Error: C stack usage 15923504 is too close to the limit 作为错误。

trial.ttest<-trial %>% 
  group_by(type) %>% 
  do(tidy(tryCatch(t.test(.$y),error=function(e){
    tribble(
      ~estimate,
      .$y,NA,NA,NA,NA,NA,NA,NA
    )
  })))

如果我放弃使用 tribble 而只使用 tryCatch return 值，它会将其添加到名为 x 的新列中。

代码：

trial.ttest<-trial %>% 
  group_by(type) %>% 
  do(tidy(tryCatch(t.test(.$y),error=function(e){
    .$y
  })))

结果：

  type  x estimate statistic    p.value parameter  conf.low
1    A  1       NA        NA         NA        NA        NA
2    A  1       NA        NA         NA        NA        NA
3    A  1       NA        NA         NA        NA        NA
4    A  1       NA        NA         NA        NA        NA
5    B NA 2.333333         7 0.01980394         2 0.8991158
6    C NA 2.666667         8 0.01526807         2 1.2324491
  conf.high            method alternative
1        NA              <NA>        <NA>
2        NA              <NA>        <NA>
3        NA              <NA>        <NA>
4        NA              <NA>        <NA>
5  3.767551 One Sample t-test   two.sided
6  4.100884 One Sample t-test   two.sided

有没有办法让常量值进入估计列，方法列导致 Constant Value 而所有其他列为 NA 而不是最后一位中的新列代码？

编辑 1：

我忘记添加所需的结果数据框。

期望的结果：

  type estimate statistic    p.value parameter  conf.low conf.high            method alternative
1    A 1.000000        NA         NA        NA        NA        NA    Constant Value        <NA>
2    B 2.333333         7 0.01980394         2 0.8991158  3.767551 One Sample t-test   two.sided
3    C 2.666667         8 0.01526807         2 1.2324491  4.100884 One Sample t-test   two.sided

编辑 2：解决方案尝试 A

代码：

library(dplyr)
library(purrr)
library(broom)
trial %>% 
  split(.$type) %>% 
  map_if(.p = ~length(unique(.$y))>1, 
         .f = ~tidy(t.test(.$y)), 
         .else = ~tibble(estimate=.$y[1], method="Constant Value")) %>% 
  bind_rows(.id = 'type')

结果：

  type type  y estimate statistic    p.value parameter  conf.low
1    A    A  1       NA        NA         NA        NA        NA
2    A    A  1       NA        NA         NA        NA        NA
3    A    A  1       NA        NA         NA        NA        NA
4    A    A  1       NA        NA         NA        NA        NA
5    B <NA> NA 2.333333         7 0.01980394         2 0.8991158
6    C <NA> NA 2.666667         8 0.01526807         2 1.2324491
  conf.high            method alternative
1        NA              <NA>        <NA>
2        NA              <NA>        <NA>
3        NA              <NA>        <NA>
4        NA              <NA>        <NA>
5  3.767551 One Sample t-test   two.sided
6  4.100884 One Sample t-test   two.sided

Answer 1

这是一个使用 purrr::map_if 的选项，我们仅在 length(unique(y))>1 或 n_distinct(y)>1[=15= 时应用 t.test ]

library(dplyr)
library(purrr)
library(broom)
trial %>% 
  split(.$type) %>% 
  map_if(.p = ~length(unique(.$y))>1, 
         .f = ~tidy(t.test(.$y)), 
         .else = ~tibble(estimate=.$y[1], method="Constant Value")) %>% 
  bind_rows(.id = 'type')

# A tibble: 3 x 9
  type  estimate method            statistic p.value parameter conf.low conf.high alternative
  <chr>    <dbl> <chr>                 <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>      
1 A         1    Constant Value          NA  NA             NA   NA         NA    NA         
2 B         2.33 One Sample t-test        7.  0.0198         2    0.899      3.77 two.sided  
3 C         2.67 One Sample t-test        8   0.0153         2    1.23       4.10 two.sided

PS：使用 purrr >= 0.3.2

使用 Tidy and Do 时处理多个 T 检验的错误

Handling Errors for Multiple T-tests when using Tidy and Do

r

try-catch

tidyr

tidyverse

t-test