运行 t.test() 在多列上输出 tibble

Question

我有一个数据框如下

record_id   group      enzyme1     enzyme2  ... ... 
            <factor>   <dbl>       <dbl>    ... ... 
1           control    34.5        32.3     ... ...
2           control    32.1        34.1     ... ...
3           treatment  123.1       12.1     ... ...

基本上是一个名为 group 的分组变量，多个因变量，如 enzyme1 等

我可以运行一个 t 检验并将其包装成一个 tibble，如下所示：

tidy(t.test(enzyme1 ~ group))

我想基本上将所有 t 检验输出堆叠在一起，看起来像这样

              estimate   statistic  p.value  parameter  conf.low   conf.high
enzyme 1      197.7424   0.3706244  0.7119  75.3982  -865.0291  1260.514
enzyme 2      XXX.XX     X.xxx      0.XXXX  XX.XXXX  -XX.XXX    XX.XXX

等等。

有什么想法吗？

Answer 1

我们可以利用 purrr::map_df()，它位于 library(tidyverse)，像这样：

library(broom)
library(tidyverse) # purrr is in here
data(mtcars)

#reproducible data to simulate your case
mtcars2 <- filter(mtcars, cyl %in% c(4, 6)) 
mtcars2$cyl <- as.factor(mtcars2$cyl)

# capture the columns you want to t.test
cols_not_cyl <- names(mtcars2)[-2]

# turn those column names into formulas
formulas <- paste(cols_not_cyl, "~ cyl") %>%
    map(as.formula) %>% # needs to be class formula
    set_names(cols_not_cyl) # useful for map_df()

# do the tests, then stack them all together
map_df(formulas, ~ tidy(t.test(formula = ., data = mtcars2)),
       .id = "column_id")

Answer 2

通过使用 map 来计算所有测试，然后通过 reduce 来绑定它们：

 df <- data.frame(record_id = 1:50, group = sample(c("control", "treatment"), 50, replace = TRUE), 
             enzyme1 = rnorm(50),
             enzyme2 = rnorm(50))

library(broom)
library(dplyr)
library(purrr)
map(paste0("enzyme", 1:2), ~tidy(t.test(as.formula(paste0(.x, "~ group")), 
data = df))) %>% 
reduce(bind_rows)

Answer 3

您可以创建一个空的 data.frame，然后使用 rbind() 将您的信息循环添加到其中。

这是一个使用鸢尾花数据集的例子：

df=data.frame()
for(i in 1:(length(colnames(iris))-1)){ ##change your length to whatever colnames you have

  variableName = colnames(iris)[i] ##loop through the desired colnames

  df<-rbind(df,cbind(variableName, tidy(t.test(Petal.Width~Species,data=iris[1:99,]))))

}

Answer 4

也可以尝试这样的 tidyverse 方法：

df %>% 
    summarise_at(vars(starts_with('enzyme')), funs(list(tidy(t.test(. ~ group))))) %>% 
    map(1) %>% bind_rows(.id='enzymes')

#  enzymes estimate estimate1 estimate2 statistic    p.value parameter   conf.low conf.high                  method alternative
#1 enzyme1   -104.3      33.3     137.6 -7.168597 0.08610502  1.013697 -283.37000  74.77000 Welch Two Sample t-test   two.sided
#2 enzyme2     19.6      33.2      13.6 11.204574 0.01532388  1.637394   10.22717  28.97283 Welch Two Sample t-test   two.sided

数据:

df <- read.table(text = "record_id   group      enzyme1     enzyme2
1           control    34.5        32.3
2           control    32.1        34.1
3           treatment  123.1       12.1  
4           treatment  152.1       15.1  ", header=T)

运行 t.test() 在多列上输出 tibble

running t.test() on multiple columns to output tibble

statistics

r

tidy

dplyr