如何在 R 中使用正则表达式从句子中提取字符串?
How to extract string from sentence using regex in R?
我想在 R 中使用正则表达式从句子中提取字符串。我是 R 的新手,不知道从哪里开始或如何做?
string<-c(".\n Written by\nJ-S-Golden \n
\n \n \n Plot Summary\n |\n Plot
Synopsis\n \n \n Plot Keywords:\n wrongful
imprisonment\n |\n escape from prison\n
|\n based on the works of stephen king\n |\n
prison\n |\n voice over narration\n | See
All (296) » \n \n Taglines:\nFear can hold you
prisoner. Hope can set you free. \n \n")
我有字符串,我想要输出的是:
Plot Keywords:
\n wrongful imprisonment\n
|\n escape from prison\n
|\n based on the works of stephen king\n
|\n prison\n
|\n voice over narration\n
| See All (296) » \n \n
我不知道如何从字符串中提取干净的数据。谁能帮帮我。
这里是使用基础 R 的 sub
函数的解决方案。这匹配(并包括)前导文本 Plot Keywords:
。然后,它使用经过调和的点来匹配任何字符,直到但不包括第一个后跟冒号的标签。
sub("(?s).*(Plot Keywords:(?:(?![^: ]+:).)*).*", "\1", string, perl=TRUE)
[1] "Plot Keywords:\n wrongful \nimprisonment\n
|\n escape from prison\n
\n|\n based on the works of
stephen king\n
|\n \nprison\n |\n voice over narration\n
| See \nAll (296) » \n \n "
在这种特殊情况下,纯正则表达式演示可能比 R 演示更有帮助,所以这里是 link 到一个:
我想在 R 中使用正则表达式从句子中提取字符串。我是 R 的新手,不知道从哪里开始或如何做?
string<-c(".\n Written by\nJ-S-Golden \n
\n \n \n Plot Summary\n |\n Plot
Synopsis\n \n \n Plot Keywords:\n wrongful
imprisonment\n |\n escape from prison\n
|\n based on the works of stephen king\n |\n
prison\n |\n voice over narration\n | See
All (296) » \n \n Taglines:\nFear can hold you
prisoner. Hope can set you free. \n \n")
我有字符串,我想要输出的是:
Plot Keywords:
\n wrongful imprisonment\n
|\n escape from prison\n
|\n based on the works of stephen king\n
|\n prison\n
|\n voice over narration\n
| See All (296) » \n \n
我不知道如何从字符串中提取干净的数据。谁能帮帮我。
这里是使用基础 R 的 sub
函数的解决方案。这匹配(并包括)前导文本 Plot Keywords:
。然后,它使用经过调和的点来匹配任何字符,直到但不包括第一个后跟冒号的标签。
sub("(?s).*(Plot Keywords:(?:(?![^: ]+:).)*).*", "\1", string, perl=TRUE)
[1] "Plot Keywords:\n wrongful \nimprisonment\n
|\n escape from prison\n
\n|\n based on the works of
stephen king\n
|\n \nprison\n |\n voice over narration\n
| See \nAll (296) » \n \n "
在这种特殊情况下,纯正则表达式演示可能比 R 演示更有帮助,所以这里是 link 到一个: