如何使用R中其他数据框的列查询数据框

Question

我在 R 中有 2 个数据帧，我想使用数据帧 "y" 进行查询，例如数据帧 "x".

的参数

我有这个代码：

x <- c('The book is on the table','I hear birds outside','The electricity 
came back')
x <- data.frame(x)
colnames(x) <- c('text')
x

y <- c('book','birds','electricity')
y <- data.frame(y)
colnames(y) <- c('search')
y

r <- sqldf("select * from x where text IN (select search from y)")
r

我想在这里使用 "like"，但我不知道。你能帮帮我吗？

Answer 1

您可以使用 fuzzyjoin 包：

library(dplyr)
library(fuzzyjoin)

regex_join(
  mutate_if(x, is.factor, as.character), 
  mutate_if(y, is.factor, as.character), 
  by = c("text" = "search")
)

#                          text      search
# 1    The book is on the table        book
# 2        I hear birds outside       birds
# 3 The electricity \ncame back electricity

Answer 2

如果没有更多样化的灯具，很难知道这是否是您想要的。为了增加一点变化，我在 y$search - y = c('book','birds','electricity', 'cat') 中添加了一个额外的词。更多变化将进一步阐明

只知道哪些语句中有哪些词？ sapply 和 grepl

> m = sapply(y$search, grepl, x$text)
> rownames(m) = x$text
> colnames(m) = y$search
> m
                             book birds electricity   cat
The book is on the table     TRUE FALSE       FALSE FALSE
I hear birds outside        FALSE  TRUE       FALSE FALSE
The electricity \ncame back FALSE FALSE        TRUE FALSE

只提取匹配的行？

> library(magrittr)  # To use the pipe, "%>%"
> x %>% data.table::setDT()  # To return the result as a table easily
>
> x[(sapply(y$search, grepl, x$text) %>% rowSums() %>% as.logical()) * (1:nrow(x)), ]
                          text
1:    The book is on the table
2:        I hear birds outside
3: The electricity \ncame back

@Aurèle 的解决方案将给出匹配文本及其匹配文本的最佳结果。请注意，如果 back 也在 y$search 中，文本 The electricity \ncame back 将在匹配的不同搜索词的结果中报告两次，因此在唯一性不重要的情况下更好.

所以这在很大程度上取决于你想要的输出。

Answer 3

如果您想要 sqldf 解决方案，我认为这可行：

sqldf("select x.text, y.search FROM x JOIN y on x.text LIKE '%' || y.search || '%'")

##                          text      search
## 1    The book is on the table        book
## 2        I hear birds outside       birds
## 3 The electricity \ncame back electricity

如何使用R中其他数据框的列查询数据框

How to query a dataframe using a column of other dataframe in R

sql

r

dataframe

sqldf