根据 dplyr 中的条件查找组间差异

Find differences among groups based on a condition in dplyr

我有一个看起来像这样的数据框,但它很大。

df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
                group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df

 gene    group
   A     group1
   B     group1
   F     group1
   A     group2
   D     group2
   E     group2
   B     group3
   C     group3
   D     group3
   G     group3

基于列 gene,我想找到包含基因“A”的组与不包含基因“A”的组之间的独特差异包括基因 A.

我希望我的数据在“过滤”后看起来像这样

gene group
 F    group1
 E    group2

因为 F 是包含基因 A 的组中唯一存在的基因,而它不存在于任何其他组中。

我们可以 filter 'gene' 包含 'A' 而没有 'A' 的行,然后执行 anti_join

library(dplyr)
tmp1 <- df %>% 
       filter(group %in% group[gene %in% 'A'])
 
tmp2 <- df %>% 
          group_by(group) %>% 
         filter(!'A' %in% gene) %>%
         ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
      filter(gene != 'A')

-输出

 gene  group
1    F group1
2    E group2