正则表达式替换 R 中的 wiki 引用

Question

替换维基百科文章中引用的正则表达式是什么？

示例输入：

 text <- "[76][note 7] just like traditional Hinduism regards the Vedas "

预期输出：

"just like traditional Hinduism regards the Vedas"

我试过了：

> text <- "[76][note 7] just like traditional Hinduism regards the Vedas "
> library(stringr)
> str_replace_all(text, "\[ \d+ \]", "")
[1] "[76][note 7] just like traditional Hinduism regards the Vedas "

Answer 1

试试这个：

text <- "[76][note 7] just like traditional Hinduism regards the Vedas "
 library(stringr)
 str_replace_all(text, "\[[^\]]*\]\s*", "")

输出：

 "just like traditional Hinduism regards the Vedas "

Answer 2

这应该可以解决问题：

trimws(sub("\[.*\]", "",text))

结果：

[1] "just like traditional Hinduism regards the Vedas"

此模式查找左括号 (\[)、右括号 (\]) 以及中间的所有内容 (.*)。

默认情况下 .* 是贪心的，即它会尽可能匹配，即使有右括号和左括号，直到找到最后一个右括号。此匹配项被替换为空字符串。

最后，trimws 函数将删除结果中星号和末尾的 space。

编辑：删除整个句子中的引文

如果在句子中的几个地方有引用，则模式和功能将更改为：

trimws(gsub(" ?\[.*?\] ", "",text))

例如，如果句子是：

text1 <- "[76][note 7] just like traditional Hinduism [34] regards the Vedas "
text2 <- "[76][note 7] just like traditional Hinduism[34] regards the Vedas "

相应的结果将是：

[1] "just like traditional Hinduism regards the Vedas"
[1] "just like traditional Hinduism regards the Vedas"

模式变化：

.*? 会将正则表达式从贪婪更改为惰性。也就是说，它将尝试匹配最短的模式，直到找到第一个右括号。

起始 ?（space + 问号）这将尝试匹配左括号前的可选 space。

Answer 3

这个正则表达式是一个选项：

(?!.*\]).*

lookabout（括号内的块）将在最后一个“]”之后贪婪地设置指针。表达式“.*”的其余部分将匹配您想要的（包括前导 space // 但在您选择的语言中这将是一个简单的匹配）直到新行

Answer 4

您的 \[ \d+ \] 不起作用，因为模式中有空格。此外，如果删除空格，表达式将只匹配 [+digits+] 而不会匹配 [note 7]-like 子字符串。

这是一个使用 gsub 和 TRE 正则表达式的 Base R 解决方案（不需要 perl=TRUE）：

text <- "[76][note 7] just like traditional Hinduism regards the Vedas "
trimws(gsub("\[[^]]+]", "", text))
## Or to remove only those [] that contain digits/word + space + digits
trimws(gsub("\[(?:[[:alnum:]]+[[:blank:]]*)?[0-9]+]", "", text))

见R demo

图案解释:

\[ - 文字 [（必须在字符 class 外转义）
(?:[[:alnum:]]+[[:blank:]]*)? -（由于末尾有 ? 量词的可选序列）1 个或多个字母数字后跟 0+ 个空格或制表符
[0-9]+ - 1+ 位数
] - 文字 ]（不需要在字符外转义 class）

trimws 删除 leading/trailing 空格。

查看 regex demo（注意选择 PCRE 选项是因为它支持 POSIX 字符 classes，请勿使用此站点测试您的 TRE 正则表达式模式！）。

正则表达式替换 R 中的 wiki 引用

Regex to replace wiki citation in R

regex

r

stringr