如何在 R 中使用正则表达式从句子中提取字符串?

How to extract string from sentence using regex in R?

我想在 R 中使用正则表达式从句子中提取字符串。我是 R 的新手,不知道从哪里开始或如何做?

string<-c(".\n                Written by\nJ-S-Golden            \n        
\n        \n         \n                Plot Summary\n    |\n        Plot 
Synopsis\n    \n        \n            Plot Keywords:\n wrongful 
imprisonment\n                        |\n escape from prison\n                        
|\n based on the works of stephen king\n                        |\n 
prison\n                        |\n voice over narration\n            | See 
All (296) »      \n        \n            Taglines:\nFear can hold you 
prisoner. Hope can set you free.        \n        \n")

我有字符串,我想要输出的是:

Plot Keywords:
\n wrongful imprisonment\n
|\n escape from prison\n
|\n based on the works of stephen king\n                        
|\n prison\n                        
|\n voice over narration\n            
| See All (296) »      \n        \n

我不知道如何从字符串中提取干净的数据。谁能帮帮我。

这里是使用基础 R 的 sub 函数的解决方案。这匹配(并包括)前导文本 Plot Keywords:。然后,它使用经过调和的点来匹配任何字符,直到但不包括第一个后跟冒号的标签。

sub("(?s).*(Plot Keywords:(?:(?![^: ]+:).)*).*", "\1", string, perl=TRUE)

[1] "Plot Keywords:\n wrongful \nimprisonment\n
                    |\n escape from prison\n
                    \n|\n based on the works of
     stephen king\n
                    |\n \nprison\n                        |\n voice over narration\n
        | See \nAll (296) »      \n        \n            "

在这种特殊情况下,纯正则表达式演示可能比 R 演示更有帮助,所以这里是 link 到一个:

Demo