正则表达式字符串匹配词模式
Regex string match words pattern
我有这个模式与抗生素
atb <- c("acefa","ampicilin","fortum")
还有这个数据框
DF1 <- structure(list(ID = 1:3, Text = c("Person 1 take acefa and ampicilin", "fortum and acefa are antibiotics", "Person 3 has no antibiotics but ampicilin")), class = "data.frame", row.names = c(NA, -3L))
DF1
ID Text
1 Person 1 take acefa and ampicilin
2 fortum and acefa are antibiotics
3 Person 3 has no antibiotics but ampicilin
我想要这个
DF1
ID Text atb
1 Person 1 take acefa and ampicilin c("acefa","ampicilin")
2 fortum and acefa are antibiotics c("fortum","acefa")
3 Person 3 has no antibiotics but ampicilin ampicilin
我试过了
DF1%>%
mutate(atb = regmatches(Text, regexec(atb, Text)))
和
DF1%>%
mutate(atb = str_extract_all(Text, atb)))
但是不行。
但是,它像这样与 grepl 一起工作
DF1%>%
mutate(atb = grepl(atb, Text)))
我可以从模式中获取带有单词的列吗?
设置正则表达式并使用strapplyc
:
library(dplyr)
library(gsubfn)
result <- DF1 %>%
mutate(atb = strapplyc(Text, paste(atb, collapse = "|")))
str(result$atb)
给予:
List of 3
$ : chr [1:2] "acefa" "ampicilin"
$ : chr [1:2] "fortum" "acefa"
$ : chr "ampicilin"
我有这个模式与抗生素
atb <- c("acefa","ampicilin","fortum")
还有这个数据框
DF1 <- structure(list(ID = 1:3, Text = c("Person 1 take acefa and ampicilin", "fortum and acefa are antibiotics", "Person 3 has no antibiotics but ampicilin")), class = "data.frame", row.names = c(NA, -3L))
DF1
ID Text
1 Person 1 take acefa and ampicilin
2 fortum and acefa are antibiotics
3 Person 3 has no antibiotics but ampicilin
我想要这个
DF1
ID Text atb
1 Person 1 take acefa and ampicilin c("acefa","ampicilin")
2 fortum and acefa are antibiotics c("fortum","acefa")
3 Person 3 has no antibiotics but ampicilin ampicilin
我试过了
DF1%>%
mutate(atb = regmatches(Text, regexec(atb, Text)))
和
DF1%>%
mutate(atb = str_extract_all(Text, atb)))
但是不行。
但是,它像这样与 grepl 一起工作
DF1%>%
mutate(atb = grepl(atb, Text)))
我可以从模式中获取带有单词的列吗?
设置正则表达式并使用strapplyc
:
library(dplyr)
library(gsubfn)
result <- DF1 %>%
mutate(atb = strapplyc(Text, paste(atb, collapse = "|")))
str(result$atb)
给予:
List of 3
$ : chr [1:2] "acefa" "ampicilin"
$ : chr [1:2] "fortum" "acefa"
$ : chr "ampicilin"