针对 A 列中的两个因素检索 B 列中具有相同因素的数据,并找出男性不与女性共享的资产
Retrieve data that has the same factor in column B for both factors in column A and finding which assets that male does not share with female
在A列中,有两个因素,男性和女性。
在 B 列中,有 5 项资产,a 到 e。
df <- data.frame(ID = c(1:7),
gender = c("male","male", "male", "female", "female","female","female"),
assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))
如何检索男性和女性共享相同资产组合的数据?
我对如何为其编写 R 语法一无所知,以下是我尝试过的方法
sameassets <- df %>% filter(filter(gender="male",assets) == filter(gender="female",assets))
期望的输出
sameassests <- data.frame(ID = c(2,5,3,4),
gender = c("male", "female", "male", "female"),
assets = c("a,b,e,d", "a,b,e,d", "b,c,e","b,c,e"))
有人可以帮忙吗?
编辑以包含其他问题。
我也想知道男的和女的不共享的资产组合是什么
因此,所需的输出如下所示
diffassests <- data.frame(ID = c(1,6,7),
gender = c("male", "female", "female"),
assets = c("a,e", "c,d", "a,d"))
您可以通过按 assets
分组然后过滤以删除所有只有一行的资产组来执行此操作,如:
library(dplyr)
df <- data.frame(ID = c(1:7),
gender = c("male","male", "male", "female", "female","female","female"),
assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))
df |>
group_by(assets) |>
filter(n() > 1) |>
arrange(assets, ID)
df %>%
group_by(assets) %>%
filter(all(c('male', 'female') %in% gender))
# A tibble: 4 x 3
# Groups: assets [2]
ID gender assets
<int> <chr> <chr>
1 2 male a,b,e,d
2 3 male b,c,e
3 4 female b,c,e
4 5 female a,b,e,d
另一种选择是首先按 assets
分组,然后确定 assets
的不同 gender
的数量是否等于数据帧级别的不同 gender
.第一个 gender
指的是每个组的 gender
,然后 .$gender
指的是整个 gender
列。请参阅 了解原始想法。
library(dplyr)
df1 %>%
group_by(assets) %>%
filter(n_distinct(gender) == n_distinct(.$gender))
或者一个可能的基础 R 解决方案:
df[df$assets %in% Reduce(intersect, split(df$assets, df$gender)), ]
输出
ID gender assets
<int> <chr> <chr>
1 2 male a,b,e,d
2 3 male b,c,e
3 4 female b,c,e
4 5 female a,b,e,d
在A列中,有两个因素,男性和女性。 在 B 列中,有 5 项资产,a 到 e。
df <- data.frame(ID = c(1:7),
gender = c("male","male", "male", "female", "female","female","female"),
assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))
如何检索男性和女性共享相同资产组合的数据?
我对如何为其编写 R 语法一无所知,以下是我尝试过的方法
sameassets <- df %>% filter(filter(gender="male",assets) == filter(gender="female",assets))
期望的输出
sameassests <- data.frame(ID = c(2,5,3,4),
gender = c("male", "female", "male", "female"),
assets = c("a,b,e,d", "a,b,e,d", "b,c,e","b,c,e"))
有人可以帮忙吗?
编辑以包含其他问题。 我也想知道男的和女的不共享的资产组合是什么
因此,所需的输出如下所示
diffassests <- data.frame(ID = c(1,6,7),
gender = c("male", "female", "female"),
assets = c("a,e", "c,d", "a,d"))
您可以通过按 assets
分组然后过滤以删除所有只有一行的资产组来执行此操作,如:
library(dplyr)
df <- data.frame(ID = c(1:7),
gender = c("male","male", "male", "female", "female","female","female"),
assets = c("a,e","a,b,e,d", "b,c,e","b,c,e", "a,b,e,d", "c,d","a,d"))
df |>
group_by(assets) |>
filter(n() > 1) |>
arrange(assets, ID)
df %>%
group_by(assets) %>%
filter(all(c('male', 'female') %in% gender))
# A tibble: 4 x 3
# Groups: assets [2]
ID gender assets
<int> <chr> <chr>
1 2 male a,b,e,d
2 3 male b,c,e
3 4 female b,c,e
4 5 female a,b,e,d
另一种选择是首先按 assets
分组,然后确定 assets
的不同 gender
的数量是否等于数据帧级别的不同 gender
.第一个 gender
指的是每个组的 gender
,然后 .$gender
指的是整个 gender
列。请参阅
library(dplyr)
df1 %>%
group_by(assets) %>%
filter(n_distinct(gender) == n_distinct(.$gender))
或者一个可能的基础 R 解决方案:
df[df$assets %in% Reduce(intersect, split(df$assets, df$gender)), ]
输出
ID gender assets
<int> <chr> <chr>
1 2 male a,b,e,d
2 3 male b,c,e
3 4 female b,c,e
4 5 female a,b,e,d