检测 df1 中的列与 df2 中的列的部分匹配,并在 R 中输出匹配值

Detect a partial match in a column in a df1 with a column in df2 and output the matched value in R

我在 R 中有两个数据帧:

col1 <- c("Apple pie", "Orange soda", "Pear", "Strawberry milkshake", "Kiwi")

col2 <- c("Delicious", "Refreshing", "Crunchy", "Creamy", "Sweet")

df1 <- data.frame(col1, col2)

fruits <- c("Blueberry", "Apple", "Pear", "Orange", "Watermelon", "Honeydew", "Dragonfruit", "Strawberry")

df2 <- as.data.frame(fruits)

我想看看 df1 中是否有任何值与 df2 中的值匹配 例如,我希望“Apple pie”与“Apple”匹配,“Orange soda”与“Orange”匹配。

我想得到如下所示的 df1:

col3 <- c("Apple", "Orange", "Pear", NA, NA)
  
df1 <- data.frame(col1, col2, col3)

我假设代码的结构如下所示:

df1 <- df1 %>%
  mutate(
    col3 = ifelse(df2$fruits %in% str_detect(col1),
                  df2$fruits, NA)
  )

如有任何帮助,我们将不胜感激!

我们可以使用map函数:

library(tidyverse)

df1 %>%
    mutate(col3 = map(col1, ~df2$fruits[str_detect(.x, df2$fruits)]))

这涵盖了基本元素,但您需要做一些工作来清理输出,因为它是一个列表列。

另一种选择是使用 fuzzy_join 包中的 fuzzy_left_join

library(tidyverse)
library(fuzzyjoin)

df1 %>%
    fuzzy_left_join(df2, by = c("col1" = "fruits"),
                    match_fun = str_detect)

                  col1       col2     fruits
1            Apple pie  Delicious      Apple
2          Orange soda Refreshing     Orange
3                 Pear    Crunchy       Pear
4 Strawberry milkshake     Creamy Strawberry
5                 Kiwi      Sweet       <NA>