基于多个模式拆分句子保持分隔符
Split sentence based on multiple patterns keeping delimiter
我有一些句子的定界模式是“(has|is|thinks)”
我想在第二句中包含分隔符并删除所有尾随空格,如下所示:
mystr1 <- "the bird is now a dog"
mystr2 <- "the small cow thinks like a dog"
mystr3 <- "the fish has become a dog"
结果:
"the bird" "is now a dog"
"the small cow" "thinks like a dog"
"the fish" "has become a dog"
注意:
str_split(mystr3, "(has|is|thinks)", n = 2)
结果
"the f" "h has become a dog"
因为“is”是一个分隔符并且是“fish”的一部分
如何最好地做到这一点?
您可以使用正向先行模式来保留定界符和单词边界,以避免在单词中间拆分。
split_sent <- function(x) {
trimws(stringr::str_split(x, '(?=\b(has|is|thinks)\b)', n = 2)[[1]])
}
split_sent(mystr1)
#[1] "the bird" "is now a dog"
split_sent(mystr2)
#[1] "the small cow" "thinks like a dog"
split_sent(mystr3)
#[1] "the fish" "has become a dog"
我有一些句子的定界模式是“(has|is|thinks)”
我想在第二句中包含分隔符并删除所有尾随空格,如下所示:
mystr1 <- "the bird is now a dog"
mystr2 <- "the small cow thinks like a dog"
mystr3 <- "the fish has become a dog"
结果:
"the bird" "is now a dog"
"the small cow" "thinks like a dog"
"the fish" "has become a dog"
注意:
str_split(mystr3, "(has|is|thinks)", n = 2)
结果
"the f" "h has become a dog"
因为“is”是一个分隔符并且是“fish”的一部分
如何最好地做到这一点?
您可以使用正向先行模式来保留定界符和单词边界,以避免在单词中间拆分。
split_sent <- function(x) {
trimws(stringr::str_split(x, '(?=\b(has|is|thinks)\b)', n = 2)[[1]])
}
split_sent(mystr1)
#[1] "the bird" "is now a dog"
split_sent(mystr2)
#[1] "the small cow" "thinks like a dog"
split_sent(mystr3)
#[1] "the fish" "has become a dog"