基于多个模式拆分句子保持分隔符

Split sentence based on multiple patterns keeping delimiter

我有一些句子的定界模式是“(has|is|thinks)”

我想在第二句中包含分隔符并删除所有尾随空格,如下所示:

mystr1 <- "the bird is now a dog"
mystr2 <- "the small cow thinks like a dog"
mystr3 <- "the fish has become a dog"

结果:

"the bird"          "is now a dog"
"the small cow"     "thinks like a dog"
"the fish"          "has become a dog"

注意: str_split(mystr3, "(has|is|thinks)", n = 2)

结果 "the f" "h has become a dog"

因为“is”是一个分隔符并且是“fish”的一部分

如何最好地做到这一点?

您可以使用正向先行模式来保留定界符和单词边界,以避免在单词中间拆分。

split_sent <- function(x) {
    trimws(stringr::str_split(x, '(?=\b(has|is|thinks)\b)', n = 2)[[1]])
}

split_sent(mystr1)
#[1] "the bird"     "is now a dog"
split_sent(mystr2)
#[1] "the small cow"     "thinks like a dog"
split_sent(mystr3)
#[1] "the fish"         "has become a dog"