针对 A 列中的两个因素检索 B 列中具有相同因素的数据,并找出男性不与女性共享的资产

Retrieve data that has the same factor in column B for both factors in column A and finding which assets that male does not share with female

在A列中,有两个因素,男性和女性。 在 B 列中,有 5 项资产,a 到 e。

df <- data.frame(ID = c(1:7),                                   
                 gender = c("male","male", "male", "female", "female","female","female"),               
                 assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))  

   

如何检索男性和女性共享相同资产组合的数据?

我对如何为其编写 R 语法一无所知,以下是我尝试过的方法

sameassets <- df %>% filter(filter(gender="male",assets) == filter(gender="female",assets))

期望的输出

sameassests <- data.frame(ID = c(2,5,3,4),                                   
                          gender = c("male", "female", "male", "female"),               
                          assets = c("a,b,e,d", "a,b,e,d", "b,c,e","b,c,e"))

有人可以帮忙吗?

编辑以包含其他问题。 我也想知道男的和女的不共享的资产组合是什么

因此,所需的输出如下所示

diffassests <- data.frame(ID = c(1,6,7),                                   
                              gender = c("male", "female", "female"),               
                              assets = c("a,e", "c,d", "a,d"))

您可以通过按 assets 分组然后过滤以删除所有只有一行的资产组来执行此操作,如:

library(dplyr)

df <- data.frame(ID = c(1:7),                                   
                 gender = c("male","male", "male", "female", "female","female","female"),               
                 assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))  

df |> 
  group_by(assets) |> 
  filter(n() > 1) |> 
  arrange(assets, ID)
df %>%
   group_by(assets) %>%
   filter(all(c('male', 'female') %in% gender))

# A tibble: 4 x 3
# Groups:   assets [2]
     ID gender assets 
  <int> <chr>  <chr>  
1     2 male   a,b,e,d
2     3 male   b,c,e  
3     4 female b,c,e  
4     5 female a,b,e,d

另一种选择是首先按 assets 分组,然后确定 assets 的不同 gender 的数量是否等于数据帧级别的不同 gender.第一个 gender 指的是每个组的 gender,然后 .$gender 指的是整个 gender 列。请参阅 了解原始想法。

library(dplyr)

df1 %>% 
  group_by(assets) %>% 
  filter(n_distinct(gender) == n_distinct(.$gender))

或者一个可能的基础 R 解决方案:

df[df$assets %in% Reduce(intersect, split(df$assets, df$gender)), ]

输出

     ID gender assets 
  <int> <chr>  <chr>  
1     2 male   a,b,e,d
2     3 male   b,c,e  
3     4 female b,c,e  
4     5 female a,b,e,d