R / tidyverse - 查找多个字符列之间的交集
R / tidyverse - find intersect between multiple character columns
我有以下问题,我有一个包含多个字符列的 tibble
。
我尝试在下面提供 MRE:
library(tidyverse)
df <- tibble(food = c("pizza, bread, apple","joghurt, cereal, banana"),
food2 = c("bread, sausage, strawberry", "joghurt, oat, bacon"),
food3 = c("ice cream, bread, milkshake", "melon, cake, joghurt")
)
df %>%
# rowwise() %>%
mutate(allcolumns = map2(
str_split(food, ", "),
str_split(food2, ", "),
# str_split(food3, ", "),
intersect
) %>% unlist()
) -> df_new
我的目标是获取 all 列的常用词。单词在列中以 ,
分隔。在 MRE 中,我能够找到两列之间的交集,但是我无法找到解决此问题的方法。我尝试了 Reduce
但没能成功。
作为编辑:我还想将它作为一个新行附加到现有的 tibble
我们可以使用 map
遍历列,执行 str_split
然后 reduce
得到 intersect
for elementwise intersect
library(dplyr)
library(purrr)
library(stringr)
df %>%
purrr::map(str_split, ", ") %>%
transpose %>%
purrr::map_chr(reduce, intersect) %>%
mutate(df, Intersect = .)
-输出
# A tibble: 2 x 4
food food2 food3 Intersect
<chr> <chr> <chr> <chr>
1 pizza, bread, apple bread, sausage, strawberry ice cream, bread, milkshake bread
2 joghurt, cereal, banana joghurt, oat, bacon melon, cake, joghurt joghurt
或者也可以使用 pmap
df %>%
mutate(Intersect = pmap(across(everything(), str_split, ", "),
~ list(...) %>%
reduce(intersect)))
我有以下问题,我有一个包含多个字符列的 tibble
。
我尝试在下面提供 MRE:
library(tidyverse)
df <- tibble(food = c("pizza, bread, apple","joghurt, cereal, banana"),
food2 = c("bread, sausage, strawberry", "joghurt, oat, bacon"),
food3 = c("ice cream, bread, milkshake", "melon, cake, joghurt")
)
df %>%
# rowwise() %>%
mutate(allcolumns = map2(
str_split(food, ", "),
str_split(food2, ", "),
# str_split(food3, ", "),
intersect
) %>% unlist()
) -> df_new
我的目标是获取 all 列的常用词。单词在列中以 ,
分隔。在 MRE 中,我能够找到两列之间的交集,但是我无法找到解决此问题的方法。我尝试了 Reduce
但没能成功。
作为编辑:我还想将它作为一个新行附加到现有的 tibble
我们可以使用 map
遍历列,执行 str_split
然后 reduce
得到 intersect
for elementwise intersect
library(dplyr)
library(purrr)
library(stringr)
df %>%
purrr::map(str_split, ", ") %>%
transpose %>%
purrr::map_chr(reduce, intersect) %>%
mutate(df, Intersect = .)
-输出
# A tibble: 2 x 4
food food2 food3 Intersect
<chr> <chr> <chr> <chr>
1 pizza, bread, apple bread, sausage, strawberry ice cream, bread, milkshake bread
2 joghurt, cereal, banana joghurt, oat, bacon melon, cake, joghurt joghurt
或者也可以使用 pmap
df %>%
mutate(Intersect = pmap(across(everything(), str_split, ", "),
~ list(...) %>%
reduce(intersect)))