R：一个包含另一个时的多个匹配

Question

我正在尝试从下面写的字符串中提取 "Maya is ,. nice"（“”不是字符串的一部分）：

"something ransom Maya wants to go for dinner with Shawn Maya is ,. nice"

但是，我不断得到 "Maya wants to go for dinner with Shawn Maya is ,. nice"，这不是我想要的。

有什么见解吗？我在 R

中使用 stringr

Answer 1

base R 中的一个选项，我们匹配单词 'Maya' 后跟 'is' 和其他字符 (.*) till the end ($) of the string, capture as a group ((... )) and replace with the backreference (\1`) 捕获组

sub(".*\b(Maya is .*$)", "\1", str1)
#1] "Maya is ,. nice"

或 regexpr/regmatches

regmatches(str1, regexpr("Maya is .*$", str1))
#[1] "Maya is ,. nice"

或 stringr

library(stringr)
str_extract(str1, "Maya is .*$")

注意：Op 的预期输出已显示在 post

数据

str1 <- "something ransom Maya wants to go for dinner with Shawn Maya is ,. nice"

R：一个包含另一个时的多个匹配

R: Multiple matches when one includes another

string

r

pattern-matching

text-mining

数据