从 R 中的列中搜索 word/phrase

search for word/phrase from column in R

我有这样的数据:

> head(df)
  ID                                                 Comment
1  1                                            I ate dinner.
2  2                              We had a three-course meal.
3  3                             Brad came to dinner with us.
4  4                                     He loves fish tacos.
5  5  In the end, we all felt like we ate too much. Code 5.16
6  6   We all agreed; it was a magnificent evening.72 points.

我想创建两个新列,一个名为 A,一个名为 B。 如果发生以下任何 words/phrases,我希望 A 列等于 1:dinner,evening,we ate 如果出现以下任何 words/phrases,我希望 B 列等于 1:in the end,all,Brad,5.16.

我该怎么做?请注意,我需要完全匹配。

我们可以在base R

中使用grepl
df$A <- +(grepl("\b(dinner|evening|we|ate)\b", df$Comment))
df$B <- +(grepl("\b(in the end|all|Brad|5\.16)\b", df$Comment))

-输出

df
  ID                                                 Comment A B
1  1                                           I ate dinner. 1 0
2  2                             We had a three-course meal. 0 0
3  3                            Brad came to dinner with us. 1 1
4  4                                    He loves fish tacos. 0 0
5  5 In the end, we all felt like we ate too much. Code 5.16 1 1
6  6  We all agreed; it was a magnificent evening.72 points. 1 1

注意:也可以使用 paste 创建模式

v1 <- c("dinner", "evening", "we", "ate")
v2 <- c("in the end", "all", "Brad", "5.16")
pat1 <- paste0("\b(", paste(v1, collapse = "|"), ")\b")
pat2 <- paste0("\b(", paste(v2, collapse = "|"), ")\b")
df$A <- +(grepl(pat1, df$Comment))
df$B <- +(grepl(pat2, df$Comment))

数据

df <- structure(list(ID = 1:6, Comment = c("I ate dinner.", "We had a three-course meal.", 
"Brad came to dinner with us.", "He loves fish tacos.", "In the end, we all felt like we ate too much. Code 5.16", 
"We all agreed; it was a magnificent evening.72 points.")),
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

这个有用吗:

library(dplyr)
library(stringr)

df %>% mutate(A = +str_detect(Comment,str_c(c('dinner','evening','we ate'), collapse = '|')),
              B = +str_detect(Comment,str_c(c('in the end','all','Brad','5.16'), collapse = '|')))
# A tibble: 6 x 4
     ID Comment                                                     A     B
  <dbl> <chr>                                                   <int> <int>
1     1 I ate dinner.                                               1     0
2     2 We had a three-course meal.                                 0     0
3     3 Brad came to dinner with us.                                1     1
4     4 He loves fish tacos.                                        0     0
5     5 In the end, we all felt like we ate too much. Code 5.16     1     1
6     6 We all agreed; it was a magnificent evening.72 points       1     1