检测 df1 中的列与 df2 中的列的部分匹配,并在 R 中输出匹配值
Detect a partial match in a column in a df1 with a column in df2 and output the matched value in R
我在 R 中有两个数据帧:
col1 <- c("Apple pie", "Orange soda", "Pear", "Strawberry milkshake", "Kiwi")
col2 <- c("Delicious", "Refreshing", "Crunchy", "Creamy", "Sweet")
df1 <- data.frame(col1, col2)
fruits <- c("Blueberry", "Apple", "Pear", "Orange", "Watermelon", "Honeydew", "Dragonfruit", "Strawberry")
df2 <- as.data.frame(fruits)
我想看看 df1 中是否有任何值与 df2 中的值匹配 例如,我希望“Apple pie”与“Apple”匹配,“Orange soda”与“Orange”匹配。
我想得到如下所示的 df1:
col3 <- c("Apple", "Orange", "Pear", NA, NA)
df1 <- data.frame(col1, col2, col3)
我假设代码的结构如下所示:
df1 <- df1 %>%
mutate(
col3 = ifelse(df2$fruits %in% str_detect(col1),
df2$fruits, NA)
)
如有任何帮助,我们将不胜感激!
我们可以使用map
函数:
library(tidyverse)
df1 %>%
mutate(col3 = map(col1, ~df2$fruits[str_detect(.x, df2$fruits)]))
这涵盖了基本元素,但您需要做一些工作来清理输出,因为它是一个列表列。
另一种选择是使用 fuzzy_join
包中的 fuzzy_left_join
:
library(tidyverse)
library(fuzzyjoin)
df1 %>%
fuzzy_left_join(df2, by = c("col1" = "fruits"),
match_fun = str_detect)
col1 col2 fruits
1 Apple pie Delicious Apple
2 Orange soda Refreshing Orange
3 Pear Crunchy Pear
4 Strawberry milkshake Creamy Strawberry
5 Kiwi Sweet <NA>
我在 R 中有两个数据帧:
col1 <- c("Apple pie", "Orange soda", "Pear", "Strawberry milkshake", "Kiwi")
col2 <- c("Delicious", "Refreshing", "Crunchy", "Creamy", "Sweet")
df1 <- data.frame(col1, col2)
fruits <- c("Blueberry", "Apple", "Pear", "Orange", "Watermelon", "Honeydew", "Dragonfruit", "Strawberry")
df2 <- as.data.frame(fruits)
我想看看 df1 中是否有任何值与 df2 中的值匹配 例如,我希望“Apple pie”与“Apple”匹配,“Orange soda”与“Orange”匹配。
我想得到如下所示的 df1:
col3 <- c("Apple", "Orange", "Pear", NA, NA)
df1 <- data.frame(col1, col2, col3)
我假设代码的结构如下所示:
df1 <- df1 %>%
mutate(
col3 = ifelse(df2$fruits %in% str_detect(col1),
df2$fruits, NA)
)
如有任何帮助,我们将不胜感激!
我们可以使用map
函数:
library(tidyverse)
df1 %>%
mutate(col3 = map(col1, ~df2$fruits[str_detect(.x, df2$fruits)]))
这涵盖了基本元素,但您需要做一些工作来清理输出,因为它是一个列表列。
另一种选择是使用 fuzzy_join
包中的 fuzzy_left_join
:
library(tidyverse)
library(fuzzyjoin)
df1 %>%
fuzzy_left_join(df2, by = c("col1" = "fruits"),
match_fun = str_detect)
col1 col2 fruits
1 Apple pie Delicious Apple
2 Orange soda Refreshing Orange
3 Pear Crunchy Pear
4 Strawberry milkshake Creamy Strawberry
5 Kiwi Sweet <NA>