如何在 r 中使用正则表达式删除单词前的所有措辞？

Question

我想删除 'not' 之前的词。当我尝试下面的代码片段时，我没有得到预期的结果。

test <- c("this will not work.", "'' is not one of ['A', 'B', 'C'].", "This one does not use period ending!")
gsub(".*(not .*)\.", "\1", test)

但是如果我用 [[:punct:]] 替换 \.，它工作正常。谁能告诉我为什么第一个不起作用？我可能需要保留句点以外的其他标点符号。

预期输出：

> not work
> not one of ['A', 'B', 'C']
> not use period ending!

谢谢！

Answer 1

sub('.*(not.*?)\.?$', '\1', test)

[1] "not work"                   "not one of ['A', 'B', 'C']"
[3] "not use period ending!"

Answer 2

您可以使用先行正则表达式删除 "not" 之前的所有内容，并删除末尾的句点。

gsub('.*(?=not)|\.$', '', test, perl = TRUE)
#[1] "not work"     "not one of ['A', 'B', 'C']" "not use period ending!"

Answer 3

这是您的原始代码的翻译：

如果表达式与此模式不匹配，包括 一个句点，您将无法获得匹配项，并且 gsub() 不会这样做。所以添加 [[:punct:]] 是有意义的，然后你说：“匹配该模式中的所有内容，然后是任何一种标点符号，而不是仅仅一个句号。

如果你不想使用 [[:punct:]]，你可以使用这个

(?:.*(not\s+.*)\.?).+?$

这表示

此正则表达式的输出如下：

[1] "not work"                   "not one of ['A', 'B', 'C']"
[3] "not use period ending"

上面的例子确实去掉了“！”不过如果你想保留它，我会使用 [[:punct:]] 或者你可以像这样说匹配这些标点符号中的任何一个：

[!"\#$%&'()*+,\-./:;<=>?@\[\\]^_‘{|}~]

How to remove all wording before a word using regex in r?