使用 stringr 在确切的单词和模式之间提取字符串

Extract string between exact word and pattern using stringr

我一直想知道如何使用 stringr 或另一个包在准确的单词“to the”(始终小写)和句子中的第二个逗号之间提取字符串。

例如:

字符串:“这不是我想要的,这就是我想要的,你看到了吗?,这不是我想要的”

期望的输出:“这就是我想要的,你看到了吗?”

我有这个向量:

x<-c("This not what I want to the THIS IS WHAT I WANT, DO YOU SEE IT?, this is not what I want",
     "HYU_IO TO TO to the I WANT, THIS, this i dont, want", "uiui uiu to the xxxx,,this is not, what I want")

我正在尝试使用此代码

str_extract(string = x, pattern = "(?<=to the ).*(?=\,)")

但我似乎无法让它正常工作以正确地给我这个:

"THIS IS WHAT I WANT, DO YOU SEE IT?" 
"I WANT, THIS"           
"xxxx," 

非常感谢你们的宝贵时间和帮助

替代方法,目前无法与 Gregor Thomas 方法相提并论,但不知何故是一种替代方法:

  1. 向量到 tibble
  2. to the,
  3. 分开两次
  4. 粘贴在一起
  5. 拉动矢量输出。
library(tidyverse)

as_tibble(x) %>% 
  separate(value, c("a", "b"), sep = 'to the ') %>% 
  separate(b, c("a", "c"), sep =",") %>% 
  mutate(x = paste0(a, ",", c), .keep="unused") %>% 
  pull(x)
[1] "THIS IS WHAT I WANT, DO YOU SEE IT?"
[2] "I WANT, THIS"                       
[3] "xxxx,"

你很接近!

str_extract(string = x, pattern = "(?<=to the )[^,]*,[^,]*")
# [1] "THIS IS WHAT I WANT, DO YOU SEE IT?"
# [2] "I WANT, THIS"                       
# [3] "xxxx,"      

look-behind 保持不变,[^,]* 匹配逗号以外的任何内容,然后 , 恰好匹配一个逗号,然后 [^,]* 再次匹配逗号以外的任何内容。