根据 dplyr 中的条件查找组间差异
Find differences among groups based on a condition in dplyr
我有一个看起来像这样的数据框,但它很大。
df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df
gene group
A group1
B group1
F group1
A group2
D group2
E group2
B group3
C group3
D group3
G group3
基于列 gene,我想找到包含基因“A”的组与不包含基因“A”的组之间的独特差异包括基因 A.
我希望我的数据在“过滤”后看起来像这样
gene group
F group1
E group2
因为 F 是包含基因 A 的组中唯一存在的基因,而它不存在于任何其他组中。
我们可以 filter
'gene' 包含 'A' 而没有 'A' 的行,然后执行 anti_join
library(dplyr)
tmp1 <- df %>%
filter(group %in% group[gene %in% 'A'])
tmp2 <- df %>%
group_by(group) %>%
filter(!'A' %in% gene) %>%
ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
filter(gene != 'A')
-输出
gene group
1 F group1
2 E group2
我有一个看起来像这样的数据框,但它很大。
df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df
gene group
A group1
B group1
F group1
A group2
D group2
E group2
B group3
C group3
D group3
G group3
基于列 gene,我想找到包含基因“A”的组与不包含基因“A”的组之间的独特差异包括基因 A.
我希望我的数据在“过滤”后看起来像这样
gene group
F group1
E group2
因为 F 是包含基因 A 的组中唯一存在的基因,而它不存在于任何其他组中。
我们可以 filter
'gene' 包含 'A' 而没有 'A' 的行,然后执行 anti_join
library(dplyr)
tmp1 <- df %>%
filter(group %in% group[gene %in% 'A'])
tmp2 <- df %>%
group_by(group) %>%
filter(!'A' %in% gene) %>%
ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
filter(gene != 'A')
-输出
gene group
1 F group1
2 E group2