运行 t.test() 在多列上输出 tibble
running t.test() on multiple columns to output tibble
我有一个数据框如下
record_id group enzyme1 enzyme2 ... ...
<factor> <dbl> <dbl> ... ...
1 control 34.5 32.3 ... ...
2 control 32.1 34.1 ... ...
3 treatment 123.1 12.1 ... ...
基本上是一个名为 group
的分组变量,多个因变量,如 enzyme1
等
我可以 运行 一个 t 检验并将其包装成一个 tibble,如下所示:
tidy(t.test(enzyme1 ~ group))
我想基本上将所有 t 检验输出堆叠在一起,看起来像这样
estimate statistic p.value parameter conf.low conf.high
enzyme 1 197.7424 0.3706244 0.7119 75.3982 -865.0291 1260.514
enzyme 2 XXX.XX X.xxx 0.XXXX XX.XXXX -XX.XXX XX.XXX
等等。
有什么想法吗?
我们可以利用 purrr::map_df()
,它位于 library(tidyverse)
,像这样:
library(broom)
library(tidyverse) # purrr is in here
data(mtcars)
#reproducible data to simulate your case
mtcars2 <- filter(mtcars, cyl %in% c(4, 6))
mtcars2$cyl <- as.factor(mtcars2$cyl)
# capture the columns you want to t.test
cols_not_cyl <- names(mtcars2)[-2]
# turn those column names into formulas
formulas <- paste(cols_not_cyl, "~ cyl") %>%
map(as.formula) %>% # needs to be class formula
set_names(cols_not_cyl) # useful for map_df()
# do the tests, then stack them all together
map_df(formulas, ~ tidy(t.test(formula = ., data = mtcars2)),
.id = "column_id")
通过使用 map 来计算所有测试,然后通过 reduce 来绑定它们:
df <- data.frame(record_id = 1:50, group = sample(c("control", "treatment"), 50, replace = TRUE),
enzyme1 = rnorm(50),
enzyme2 = rnorm(50))
library(broom)
library(dplyr)
library(purrr)
map(paste0("enzyme", 1:2), ~tidy(t.test(as.formula(paste0(.x, "~ group")),
data = df))) %>%
reduce(bind_rows)
您可以创建一个空的 data.frame
,然后使用 rbind()
将您的信息循环添加到其中。
这是一个使用鸢尾花数据集的例子:
df=data.frame()
for(i in 1:(length(colnames(iris))-1)){ ##change your length to whatever colnames you have
variableName = colnames(iris)[i] ##loop through the desired colnames
df<-rbind(df,cbind(variableName, tidy(t.test(Petal.Width~Species,data=iris[1:99,]))))
}
也可以尝试这样的 tidyverse 方法:
df %>%
summarise_at(vars(starts_with('enzyme')), funs(list(tidy(t.test(. ~ group))))) %>%
map(1) %>% bind_rows(.id='enzymes')
# enzymes estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
#1 enzyme1 -104.3 33.3 137.6 -7.168597 0.08610502 1.013697 -283.37000 74.77000 Welch Two Sample t-test two.sided
#2 enzyme2 19.6 33.2 13.6 11.204574 0.01532388 1.637394 10.22717 28.97283 Welch Two Sample t-test two.sided
数据:
df <- read.table(text = "record_id group enzyme1 enzyme2
1 control 34.5 32.3
2 control 32.1 34.1
3 treatment 123.1 12.1
4 treatment 152.1 15.1 ", header=T)
我有一个数据框如下
record_id group enzyme1 enzyme2 ... ...
<factor> <dbl> <dbl> ... ...
1 control 34.5 32.3 ... ...
2 control 32.1 34.1 ... ...
3 treatment 123.1 12.1 ... ...
基本上是一个名为 group
的分组变量,多个因变量,如 enzyme1
等
我可以 运行 一个 t 检验并将其包装成一个 tibble,如下所示:
tidy(t.test(enzyme1 ~ group))
我想基本上将所有 t 检验输出堆叠在一起,看起来像这样
estimate statistic p.value parameter conf.low conf.high
enzyme 1 197.7424 0.3706244 0.7119 75.3982 -865.0291 1260.514
enzyme 2 XXX.XX X.xxx 0.XXXX XX.XXXX -XX.XXX XX.XXX
等等。
有什么想法吗?
我们可以利用 purrr::map_df()
,它位于 library(tidyverse)
,像这样:
library(broom)
library(tidyverse) # purrr is in here
data(mtcars)
#reproducible data to simulate your case
mtcars2 <- filter(mtcars, cyl %in% c(4, 6))
mtcars2$cyl <- as.factor(mtcars2$cyl)
# capture the columns you want to t.test
cols_not_cyl <- names(mtcars2)[-2]
# turn those column names into formulas
formulas <- paste(cols_not_cyl, "~ cyl") %>%
map(as.formula) %>% # needs to be class formula
set_names(cols_not_cyl) # useful for map_df()
# do the tests, then stack them all together
map_df(formulas, ~ tidy(t.test(formula = ., data = mtcars2)),
.id = "column_id")
通过使用 map 来计算所有测试,然后通过 reduce 来绑定它们:
df <- data.frame(record_id = 1:50, group = sample(c("control", "treatment"), 50, replace = TRUE),
enzyme1 = rnorm(50),
enzyme2 = rnorm(50))
library(broom)
library(dplyr)
library(purrr)
map(paste0("enzyme", 1:2), ~tidy(t.test(as.formula(paste0(.x, "~ group")),
data = df))) %>%
reduce(bind_rows)
您可以创建一个空的 data.frame
,然后使用 rbind()
将您的信息循环添加到其中。
这是一个使用鸢尾花数据集的例子:
df=data.frame()
for(i in 1:(length(colnames(iris))-1)){ ##change your length to whatever colnames you have
variableName = colnames(iris)[i] ##loop through the desired colnames
df<-rbind(df,cbind(variableName, tidy(t.test(Petal.Width~Species,data=iris[1:99,]))))
}
也可以尝试这样的 tidyverse 方法:
df %>%
summarise_at(vars(starts_with('enzyme')), funs(list(tidy(t.test(. ~ group))))) %>%
map(1) %>% bind_rows(.id='enzymes')
# enzymes estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
#1 enzyme1 -104.3 33.3 137.6 -7.168597 0.08610502 1.013697 -283.37000 74.77000 Welch Two Sample t-test two.sided
#2 enzyme2 19.6 33.2 13.6 11.204574 0.01532388 1.637394 10.22717 28.97283 Welch Two Sample t-test two.sided
数据:
df <- read.table(text = "record_id group enzyme1 enzyme2
1 control 34.5 32.3
2 control 32.1 34.1
3 treatment 123.1 12.1
4 treatment 152.1 15.1 ", header=T)