从 R 中的列中搜索 word/phrase
search for word/phrase from column in R
我有这样的数据:
> head(df)
ID Comment
1 1 I ate dinner.
2 2 We had a three-course meal.
3 3 Brad came to dinner with us.
4 4 He loves fish tacos.
5 5 In the end, we all felt like we ate too much. Code 5.16
6 6 We all agreed; it was a magnificent evening.72 points.
我想创建两个新列,一个名为 A
,一个名为 B
。
如果发生以下任何 words/phrases,我希望 A 列等于 1:dinner,evening,we ate
如果出现以下任何 words/phrases,我希望 B 列等于 1:in the end,all,Brad,5.16
.
我该怎么做?请注意,我需要完全匹配。
我们可以在base R
中使用grepl
df$A <- +(grepl("\b(dinner|evening|we|ate)\b", df$Comment))
df$B <- +(grepl("\b(in the end|all|Brad|5\.16)\b", df$Comment))
-输出
df
ID Comment A B
1 1 I ate dinner. 1 0
2 2 We had a three-course meal. 0 0
3 3 Brad came to dinner with us. 1 1
4 4 He loves fish tacos. 0 0
5 5 In the end, we all felt like we ate too much. Code 5.16 1 1
6 6 We all agreed; it was a magnificent evening.72 points. 1 1
注意:也可以使用 paste
创建模式
v1 <- c("dinner", "evening", "we", "ate")
v2 <- c("in the end", "all", "Brad", "5.16")
pat1 <- paste0("\b(", paste(v1, collapse = "|"), ")\b")
pat2 <- paste0("\b(", paste(v2, collapse = "|"), ")\b")
df$A <- +(grepl(pat1, df$Comment))
df$B <- +(grepl(pat2, df$Comment))
数据
df <- structure(list(ID = 1:6, Comment = c("I ate dinner.", "We had a three-course meal.",
"Brad came to dinner with us.", "He loves fish tacos.", "In the end, we all felt like we ate too much. Code 5.16",
"We all agreed; it was a magnificent evening.72 points.")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
这个有用吗:
library(dplyr)
library(stringr)
df %>% mutate(A = +str_detect(Comment,str_c(c('dinner','evening','we ate'), collapse = '|')),
B = +str_detect(Comment,str_c(c('in the end','all','Brad','5.16'), collapse = '|')))
# A tibble: 6 x 4
ID Comment A B
<dbl> <chr> <int> <int>
1 1 I ate dinner. 1 0
2 2 We had a three-course meal. 0 0
3 3 Brad came to dinner with us. 1 1
4 4 He loves fish tacos. 0 0
5 5 In the end, we all felt like we ate too much. Code 5.16 1 1
6 6 We all agreed; it was a magnificent evening.72 points 1 1
我有这样的数据:
> head(df)
ID Comment
1 1 I ate dinner.
2 2 We had a three-course meal.
3 3 Brad came to dinner with us.
4 4 He loves fish tacos.
5 5 In the end, we all felt like we ate too much. Code 5.16
6 6 We all agreed; it was a magnificent evening.72 points.
我想创建两个新列,一个名为 A
,一个名为 B
。
如果发生以下任何 words/phrases,我希望 A 列等于 1:dinner,evening,we ate
如果出现以下任何 words/phrases,我希望 B 列等于 1:in the end,all,Brad,5.16
.
我该怎么做?请注意,我需要完全匹配。
我们可以在base R
grepl
df$A <- +(grepl("\b(dinner|evening|we|ate)\b", df$Comment))
df$B <- +(grepl("\b(in the end|all|Brad|5\.16)\b", df$Comment))
-输出
df
ID Comment A B
1 1 I ate dinner. 1 0
2 2 We had a three-course meal. 0 0
3 3 Brad came to dinner with us. 1 1
4 4 He loves fish tacos. 0 0
5 5 In the end, we all felt like we ate too much. Code 5.16 1 1
6 6 We all agreed; it was a magnificent evening.72 points. 1 1
注意:也可以使用 paste
创建模式
v1 <- c("dinner", "evening", "we", "ate")
v2 <- c("in the end", "all", "Brad", "5.16")
pat1 <- paste0("\b(", paste(v1, collapse = "|"), ")\b")
pat2 <- paste0("\b(", paste(v2, collapse = "|"), ")\b")
df$A <- +(grepl(pat1, df$Comment))
df$B <- +(grepl(pat2, df$Comment))
数据
df <- structure(list(ID = 1:6, Comment = c("I ate dinner.", "We had a three-course meal.",
"Brad came to dinner with us.", "He loves fish tacos.", "In the end, we all felt like we ate too much. Code 5.16",
"We all agreed; it was a magnificent evening.72 points.")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
这个有用吗:
library(dplyr)
library(stringr)
df %>% mutate(A = +str_detect(Comment,str_c(c('dinner','evening','we ate'), collapse = '|')),
B = +str_detect(Comment,str_c(c('in the end','all','Brad','5.16'), collapse = '|')))
# A tibble: 6 x 4
ID Comment A B
<dbl> <chr> <int> <int>
1 1 I ate dinner. 1 0
2 2 We had a three-course meal. 0 0
3 3 Brad came to dinner with us. 1 1
4 4 He loves fish tacos. 0 0
5 5 In the end, we all felt like we ate too much. Code 5.16 1 1
6 6 We all agreed; it was a magnificent evening.72 points 1 1