R / tidyverse - 查找多个字符列之间的交集

Question

我有以下问题，我有一个包含多个字符列的 tibble。我尝试在下面提供 MRE：

library(tidyverse)
df <- tibble(food = c("pizza, bread, apple","joghurt, cereal, banana"), 
             food2 = c("bread, sausage, strawberry", "joghurt, oat, bacon"),
             food3 = c("ice cream, bread, milkshake", "melon, cake, joghurt")
             )
df %>%
  # rowwise() %>%
  mutate(allcolumns = map2(
    str_split(food, ", "),
    str_split(food2, ", "),
    # str_split(food3, ", "),
    intersect
  ) %>% unlist()
  ) -> df_new

我的目标是获取 all 列的常用词。单词在列中以 , 分隔。在 MRE 中，我能够找到两列之间的交集，但是我无法找到解决此问题的方法。我尝试了 Reduce 但没能成功。

作为编辑：我还想将它作为一个新行附加到现有的 tibble

Answer 1

我们可以使用 map 遍历列，执行 str_split 然后 reduce 得到 intersect for elementwise intersect

library(dplyr)
library(purrr)
library(stringr)
df %>% 
   purrr::map(str_split, ", ") %>%
   transpose %>%
   purrr::map_chr(reduce, intersect) %>%
   mutate(df, Intersect = .)

-输出

# A tibble: 2 x 4
  food                    food2                      food3                       Intersect
  <chr>                   <chr>                      <chr>                       <chr>    
1 pizza, bread, apple     bread, sausage, strawberry ice cream, bread, milkshake bread    
2 joghurt, cereal, banana joghurt, oat, bacon        melon, cake, joghurt        joghurt

或者也可以使用 pmap

df %>%
    mutate(Intersect = pmap(across(everything(), str_split, ", "), 
        ~ list(...) %>%
              reduce(intersect)))

R / tidyverse - 查找多个字符列之间的交集

R / tidyverse - find intersect between multiple character columns

string

r

set

tidyverse