如何使用 R 中的 dplyr 将数据框中的行与多列配对?
How to pair rows in a data frame with many columns using dplyr in R?
我有一个数据框,其中包含来自对照组和实验组的多个观察结果,每个受试者都有重复。
这是我的数据框示例:
subject cohort replicate val1 val2
A control 1 10 0.1
A control 2 15 0.3
A experim 1 40 0.7
A experim 2 45 0.9
B control 1 5 0.3
B experim 1 30 0.0
C control 1 50 0.5
C experim 1 NA 1.0
我想针对每个值将每个对照观察与其对应的实验观察配对,以计算对之间的比率。所需的输出看起来像这样:
subject replicate ratio_val1 ratio_val2
A 1 4 7
A 2 3 3
B 1 6 0
C 1 NA 2
理想情况下,我希望看到它用 dplyr 和管道实现。
在按 subject
和 replicate
分组数据后,您可以使用 dplyr
中的 summarize_at
函数来汇总列 val1
和 val2
.使用[cohort == ...]
分别取实验组和对照组的值进行除法:
library(dplyr)
df %>% group_by(subject, replicate) %>%
summarize_at(vars(contains('val')),
funs("ratio" = .[cohort == "experim"]/.[cohort == "control"]))
# Source: local data frame [4 x 4]
# Groups: subject [?]
#
# subject replicate val1_ratio val2_ratio
# <fctr> <int> <dbl> <dbl>
# 1 A 1 4 7
# 2 A 2 3 3
# 3 B 1 6 0
# 4 C 1 NA 2
我们可以通过将数据集重塑为 'wide' 格式来使用 data.table
。
library(data.table)
dcast(setDT(df1), subject+replicate~cohort, value.var = c("val1", "val2"))[,
paste0("ratio_", names(df1)[4:5]) := Map(`/`, .SD[,
grep("experim", names(.SD)), with = FALSE],
.SD [, grep("control", names(.SD)), with = FALSE])][, (3:6) := NULL][]
# subject replicate ratio_val1 ratio_val2
# 1: A 1 4 7
# 2: A 2 3 3
# 3: B 1 6 0
# 4: C 1 NA 2
或者在用'subject'、'replicate'分组后,我们遍历'val'列并将'val'的相应元素除以'experim'共 'control'
setDT(df1)[, lapply(.SD[, grep("val", names(.SD)), with = FALSE],
function(x) x[cohort =="experim"]/x[cohort =="control"]) ,
by = .(subject, replicate)]
或者我们可以使用 gather/spread
来自 tidyr
library(dplyr)
library(tidyr)
df1 %>%
gather(Var, Val, val1:val2) %>%
spread(cohort, Val) %>%
group_by(subject, replicate, Var) %>%
summarise(ratio = experim/control) %>% spread(Var, ratio)
# subject replicate val1 val2
# <chr> <int> <dbl> <dbl>
# 1 A 1 4 7
# 2 A 2 3 3
# 3 B 1 6 0
# 4 C 1 NA 2
我有一个数据框,其中包含来自对照组和实验组的多个观察结果,每个受试者都有重复。
这是我的数据框示例:
subject cohort replicate val1 val2
A control 1 10 0.1
A control 2 15 0.3
A experim 1 40 0.7
A experim 2 45 0.9
B control 1 5 0.3
B experim 1 30 0.0
C control 1 50 0.5
C experim 1 NA 1.0
我想针对每个值将每个对照观察与其对应的实验观察配对,以计算对之间的比率。所需的输出看起来像这样:
subject replicate ratio_val1 ratio_val2
A 1 4 7
A 2 3 3
B 1 6 0
C 1 NA 2
理想情况下,我希望看到它用 dplyr 和管道实现。
在按 subject
和 replicate
分组数据后,您可以使用 dplyr
中的 summarize_at
函数来汇总列 val1
和 val2
.使用[cohort == ...]
分别取实验组和对照组的值进行除法:
library(dplyr)
df %>% group_by(subject, replicate) %>%
summarize_at(vars(contains('val')),
funs("ratio" = .[cohort == "experim"]/.[cohort == "control"]))
# Source: local data frame [4 x 4]
# Groups: subject [?]
#
# subject replicate val1_ratio val2_ratio
# <fctr> <int> <dbl> <dbl>
# 1 A 1 4 7
# 2 A 2 3 3
# 3 B 1 6 0
# 4 C 1 NA 2
我们可以通过将数据集重塑为 'wide' 格式来使用 data.table
。
library(data.table)
dcast(setDT(df1), subject+replicate~cohort, value.var = c("val1", "val2"))[,
paste0("ratio_", names(df1)[4:5]) := Map(`/`, .SD[,
grep("experim", names(.SD)), with = FALSE],
.SD [, grep("control", names(.SD)), with = FALSE])][, (3:6) := NULL][]
# subject replicate ratio_val1 ratio_val2
# 1: A 1 4 7
# 2: A 2 3 3
# 3: B 1 6 0
# 4: C 1 NA 2
或者在用'subject'、'replicate'分组后,我们遍历'val'列并将'val'的相应元素除以'experim'共 'control'
setDT(df1)[, lapply(.SD[, grep("val", names(.SD)), with = FALSE],
function(x) x[cohort =="experim"]/x[cohort =="control"]) ,
by = .(subject, replicate)]
或者我们可以使用 gather/spread
来自 tidyr
library(dplyr)
library(tidyr)
df1 %>%
gather(Var, Val, val1:val2) %>%
spread(cohort, Val) %>%
group_by(subject, replicate, Var) %>%
summarise(ratio = experim/control) %>% spread(Var, ratio)
# subject replicate val1 val2
# <chr> <int> <dbl> <dbl>
# 1 A 1 4 7
# 2 A 2 3 3
# 3 B 1 6 0
# 4 C 1 NA 2