R 在映射函数中继续 t.test,尽管没有足够的观测值
R Continue t.test in a map-function, although there are not enough observations
在我的示例数据中,我有 3 个数据帧。每个 df 每个阈值都有 2 个变量(varA 和 varB)。有 3 个阈值 (1, 2, 3):
df1 <- tibble(
var1A= rnorm(1:10) +1,
var1B= rnorm(1:10) +1,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
df2 <- tibble(
var1A= rnorm(1:10) +1,
var1B= rnorm(1:10) +1,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
df3 <- tibble(
var1A= rnorm(1:10) +1,
var1B= NA,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
现在我想为每个变量 t.test(varA, varB)
和每个阈值 (1, 2, 3) 执行 t.test。
由于我有超过 1 个 df,我将所有 df 放在一个映射函数中并为所有 df 映射 t.test,并对所有阈值应用 t.test:
thresholds = c(1, 2, 3)
list_dfs = c('df1','df2','df3')
map(list_dfs,
function(df_name){
x <- get(df_name)
lapply(thresholds, function(i){
t.test(x %>%
pull(paste0("var",i,"A")),
x %>%
pull(paste0("var",i,"B")))
}) %>%
map_df(broom::tidy) %>%
add_column(.before = 'estimate',
df = df_name,
threshold = thresholds)
}) %>%
do.call(rbind, .)
此代码会将所有结果映射到一个 df 中。但问题是 df3
中的 var1B
是空的。整列是 NA
。
虽然 var1B
没有足够的观测值,但我如何执行映射函数?
这是我想要的输出:
# A tibble: 9 x 12
df threshold estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 df1 1 -0.582 0.992 1.57 -1.43 0.170 16.6 -1.44 0.276 Welch~
2 df1 2 0.271 2.75 2.48 0.654 0.522 17.8 -0.601 1.14 Welch~
3 df1 3 -0.250 3.12 3.37 -0.544 0.593 17.7 -1.22 0.716 Welch~
4 df2 1 -0.169 0.747 0.916 -0.407 0.690 15.3 -1.05 0.714 Welch~
5 df2 2 0.0259 1.94 1.91 0.0702 0.945 17.9 -0.748 0.800 Welch~
6 df2 3 0.496 3.28 2.79 1.11 0.281 17.5 -0.444 1.44 Welch~
7 df3 1 NA NA NA NA NA NA NA NA NA
8 df3 2 -0.274 1.99 2.26 -0.650 0.525 15.8 -1.17 0.622 Welch~
9 df3 3 0.407 3.34 2.93 0.920 0.371 16.6 -0.529 1.34 Welch~
因为 df3 中阈值 1 的 varB NA
输出中的第 7 行也是 NA
我要做的是以不同的格式组合 data.frame
s - 这样 "A" 部分就在一个 data.frame
和 "B" 部分中 - 在其他:
dfs <- cbind(df1=df1, df2=df2, df3=df3)
dfA <- dfs[,grep("A$", colnames(dfs))]
dfB <- dfs[,grep("B$", colnames(dfs))]
那么一切就简单多了:
doTtest <- function(x, y) {
if(any(!is.na(x)) & any(!is.na(y)))
broom::tidy(t.test(x,y))
else
rep(NA, 10)
}
res <- as.data.frame(t(mapply(doTtest, dfA, dfB)))
或者您可以使用方便的库 matrixTests
:
library(matrixTests)
> col_t_welch(dfA, dfB)
obs.x obs.y obs.tot mean.x mean.y mean.diff var.x var.y stderr df statistic pvalue conf.low conf.high alternative mean.null conf.level
df1.var1A 10 10 20 1.5436119 0.7488449 0.79476695 0.2993602 0.5481971 0.2911284 16.57158 2.7299537 0.01449227 0.1793279 1.4102060 two.sided 0 0.95
df1.var2A 10 10 20 2.2205661 2.2320260 -0.01145988 0.4832561 0.5249799 0.3175273 17.96923 -0.0360910 0.97160771 -0.6786419 0.6557222 two.sided 0 0.95
df1.var3A 10 10 20 3.0457651 2.7835908 0.26217424 1.2998193 1.9933106 0.5738580 17.23565 0.4568626 0.65347516 -0.9473005 1.4716490 two.sided 0 0.95
df2.var1A 10 10 20 1.7233471 1.2761199 0.44722715 0.9328694 1.3631385 0.4791668 17.38932 0.9333434 0.36342238 -0.5620050 1.4564593 two.sided 0 0.95
df2.var2A 10 10 20 1.9278754 2.6368740 -0.70899858 1.0966493 0.6907785 0.4227798 17.11741 -1.6769925 0.11170922 -1.6005202 0.1825230 two.sided 0 0.95
df2.var3A 10 10 20 3.1245106 2.9569952 0.16751542 1.0357228 0.8209887 0.4308958 17.76242 0.3887609 0.70207375 -0.7386317 1.0736625 two.sided 0 0.95
df3.var1A 10 0 10 0.6804275 NaN NaN 0.6015624 0.0000000 NaN NaN NA NA NA NA two.sided 0 0.95
df3.var2A 10 10 20 2.0143381 1.9223843 0.09195379 0.7837613 0.7611496 0.3930535 17.99614 0.2339472 0.81766669 -0.7338338 0.9177413 two.sided 0 0.95
df3.var3A 10 10 20 3.0156624 3.2768350 -0.26117263 1.5437758 1.2608029 0.5295827 17.81860 -0.4931668 0.62791751 -1.3745971 0.8522518 two.sided 0 0.95
另一种可能性是将 t.test 放在多个 if else 函数中。
如果所有变量 A 和 B 的总和不为 0,则执行 t.test。否则粘贴 NA
map(list_dfs,
function(df_name){
x <- get(df_name)
lapply(thresholds, function(i){
if(sum(x%>%pull(paste0("var",i,"A")), na.rm = T) != 0){
if(sum(x%>%pull(paste0("var",i,"B")), na.rm = T) != 0){
t.test(x %>%
pull(paste0("var",i,"A")),
x %>%
pull(paste0("var",i,"B")))
} else NA
} else NA
}) %>%
map_df(broom::tidy)%>%
add_column(.before = 'estimate',
df = df_name,
threshold = thresholds)
}) %>% bind_rows()
在我的示例数据中,我有 3 个数据帧。每个 df 每个阈值都有 2 个变量(varA 和 varB)。有 3 个阈值 (1, 2, 3):
df1 <- tibble(
var1A= rnorm(1:10) +1,
var1B= rnorm(1:10) +1,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
df2 <- tibble(
var1A= rnorm(1:10) +1,
var1B= rnorm(1:10) +1,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
df3 <- tibble(
var1A= rnorm(1:10) +1,
var1B= NA,
var2A= rnorm(1:10) +2,
var2B= rnorm(1:10) +2,
var3A= rnorm(1:10) +3,
var3B= rnorm(1:10) +3)
现在我想为每个变量 t.test(varA, varB)
和每个阈值 (1, 2, 3) 执行 t.test。
由于我有超过 1 个 df,我将所有 df 放在一个映射函数中并为所有 df 映射 t.test,并对所有阈值应用 t.test:
thresholds = c(1, 2, 3)
list_dfs = c('df1','df2','df3')
map(list_dfs,
function(df_name){
x <- get(df_name)
lapply(thresholds, function(i){
t.test(x %>%
pull(paste0("var",i,"A")),
x %>%
pull(paste0("var",i,"B")))
}) %>%
map_df(broom::tidy) %>%
add_column(.before = 'estimate',
df = df_name,
threshold = thresholds)
}) %>%
do.call(rbind, .)
此代码会将所有结果映射到一个 df 中。但问题是 df3
中的 var1B
是空的。整列是 NA
。
虽然 var1B
没有足够的观测值,但我如何执行映射函数?
这是我想要的输出:
# A tibble: 9 x 12
df threshold estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 df1 1 -0.582 0.992 1.57 -1.43 0.170 16.6 -1.44 0.276 Welch~
2 df1 2 0.271 2.75 2.48 0.654 0.522 17.8 -0.601 1.14 Welch~
3 df1 3 -0.250 3.12 3.37 -0.544 0.593 17.7 -1.22 0.716 Welch~
4 df2 1 -0.169 0.747 0.916 -0.407 0.690 15.3 -1.05 0.714 Welch~
5 df2 2 0.0259 1.94 1.91 0.0702 0.945 17.9 -0.748 0.800 Welch~
6 df2 3 0.496 3.28 2.79 1.11 0.281 17.5 -0.444 1.44 Welch~
7 df3 1 NA NA NA NA NA NA NA NA NA
8 df3 2 -0.274 1.99 2.26 -0.650 0.525 15.8 -1.17 0.622 Welch~
9 df3 3 0.407 3.34 2.93 0.920 0.371 16.6 -0.529 1.34 Welch~
因为 df3 中阈值 1 的 varB NA
输出中的第 7 行也是 NA
我要做的是以不同的格式组合 data.frame
s - 这样 "A" 部分就在一个 data.frame
和 "B" 部分中 - 在其他:
dfs <- cbind(df1=df1, df2=df2, df3=df3)
dfA <- dfs[,grep("A$", colnames(dfs))]
dfB <- dfs[,grep("B$", colnames(dfs))]
那么一切就简单多了:
doTtest <- function(x, y) {
if(any(!is.na(x)) & any(!is.na(y)))
broom::tidy(t.test(x,y))
else
rep(NA, 10)
}
res <- as.data.frame(t(mapply(doTtest, dfA, dfB)))
或者您可以使用方便的库 matrixTests
:
library(matrixTests)
> col_t_welch(dfA, dfB)
obs.x obs.y obs.tot mean.x mean.y mean.diff var.x var.y stderr df statistic pvalue conf.low conf.high alternative mean.null conf.level
df1.var1A 10 10 20 1.5436119 0.7488449 0.79476695 0.2993602 0.5481971 0.2911284 16.57158 2.7299537 0.01449227 0.1793279 1.4102060 two.sided 0 0.95
df1.var2A 10 10 20 2.2205661 2.2320260 -0.01145988 0.4832561 0.5249799 0.3175273 17.96923 -0.0360910 0.97160771 -0.6786419 0.6557222 two.sided 0 0.95
df1.var3A 10 10 20 3.0457651 2.7835908 0.26217424 1.2998193 1.9933106 0.5738580 17.23565 0.4568626 0.65347516 -0.9473005 1.4716490 two.sided 0 0.95
df2.var1A 10 10 20 1.7233471 1.2761199 0.44722715 0.9328694 1.3631385 0.4791668 17.38932 0.9333434 0.36342238 -0.5620050 1.4564593 two.sided 0 0.95
df2.var2A 10 10 20 1.9278754 2.6368740 -0.70899858 1.0966493 0.6907785 0.4227798 17.11741 -1.6769925 0.11170922 -1.6005202 0.1825230 two.sided 0 0.95
df2.var3A 10 10 20 3.1245106 2.9569952 0.16751542 1.0357228 0.8209887 0.4308958 17.76242 0.3887609 0.70207375 -0.7386317 1.0736625 two.sided 0 0.95
df3.var1A 10 0 10 0.6804275 NaN NaN 0.6015624 0.0000000 NaN NaN NA NA NA NA two.sided 0 0.95
df3.var2A 10 10 20 2.0143381 1.9223843 0.09195379 0.7837613 0.7611496 0.3930535 17.99614 0.2339472 0.81766669 -0.7338338 0.9177413 two.sided 0 0.95
df3.var3A 10 10 20 3.0156624 3.2768350 -0.26117263 1.5437758 1.2608029 0.5295827 17.81860 -0.4931668 0.62791751 -1.3745971 0.8522518 two.sided 0 0.95
另一种可能性是将 t.test 放在多个 if else 函数中。
如果所有变量 A 和 B 的总和不为 0,则执行 t.test。否则粘贴 NA
map(list_dfs,
function(df_name){
x <- get(df_name)
lapply(thresholds, function(i){
if(sum(x%>%pull(paste0("var",i,"A")), na.rm = T) != 0){
if(sum(x%>%pull(paste0("var",i,"B")), na.rm = T) != 0){
t.test(x %>%
pull(paste0("var",i,"A")),
x %>%
pull(paste0("var",i,"B")))
} else NA
} else NA
}) %>%
map_df(broom::tidy)%>%
add_column(.before = 'estimate',
df = df_name,
threshold = thresholds)
}) %>% bind_rows()