r gsub 在术语前后提取 n 个单词

r gsub extract n words before and after a term

我需要提取出现在术语前后的 n 个词,以进行我正在进行的文本分析。下面是一个可重现的例子:

a <- c("The day was nice and dry, when she came for our game we were ready and then she left.",
"The day was nice and dry, when she came for our game, but we were not ready. She left after she waited 5 minutes.",
"The day was nice and dry, when she came, we were not here. Our game  was not completed timely, but it was completed after one hour.")

下面是我正在使用的函数,但它不适用于单词或双空格周围有标点符号的情况。

gsub(".*(( \w{1,}){3} game( \w{1,}){3}).*", "\1", a, perl = TRUE)


[1] " came for our game we were ready"                                                                                                  
[2] "The day was nice and dry, when she came for our game, but we were not ready. She left after she waited 5 minutes."                 
[3] "The day was nice and dry, when she came, we were not here. Our game  was was not completed timely, but it was completed after one hour."

下面是所需的输出

[1] " came for our game we were ready"                                                                                                  
[2] " came for our game, but we were"                 
[3] " not here. Our game was not completed"

而不是使用 space,尝试 \W{1,}:

gsub(".*(((\W{1,})\w{1,}){3} game((\W{1,})\w{1,}){3}).*", "\1", a, perl = TRUE)

[1] " came for our game we were ready"       
" came for our game, but we were"        
" not here. Our game  was not completed"

这是 stringr 包中 str_extract 的另一种方法:

library(stringr)

str_extract(a, "(( \S+){3} game[[:punct:]\s]*( \S+){3})")

# [1] " came for our game we were ready"       
#     " came for our game, but we were"        
#     " not here. Our game  was not completed"