正则表达式字符串匹配词模式

Question

我有这个模式与抗生素

atb <- c("acefa","ampicilin","fortum")

还有这个数据框

    DF1 <- structure(list(ID = 1:3, Text = c("Person 1 take acefa and ampicilin", "fortum and acefa are antibiotics", "Person 3 has no antibiotics but ampicilin")), class = "data.frame", row.names = c(NA, -3L))

DF1
    
    ID                                      Text
    1           Person 1 take acefa and ampicilin
    2            fortum and acefa are antibiotics
    3   Person 3 has no antibiotics but ampicilin

我想要这个

DF1
        
    ID                                      Text        atb
    1           Person 1 take acefa and ampicilin      c("acefa","ampicilin")
    2            fortum and acefa are antibiotics      c("fortum","acefa")
    3   Person 3 has no antibiotics but ampicilin      ampicilin

我试过了

DF1%>%
mutate(atb = regmatches(Text, regexec(atb, Text)))

和

DF1%>%
mutate(atb =  str_extract_all(Text, atb)))

但是不行。

但是，它像这样与 grepl 一起工作

DF1%>%
    mutate(atb =  grepl(atb, Text)))

我可以从模式中获取带有单词的列吗？

Answer 1

设置正则表达式并使用strapplyc:

library(dplyr)
library(gsubfn)

result <- DF1 %>% 
  mutate(atb = strapplyc(Text, paste(atb, collapse = "|")))

str(result$atb)

给予：

List of 3
 $ : chr [1:2] "acefa" "ampicilin"
 $ : chr [1:2] "fortum" "acefa"
 $ : chr "ampicilin"

正则表达式字符串匹配词模式

Regex string match words pattern

regex

string

r

grepl