在 R、gsub 和 Regex 前瞻和后视表达式中删除字符串模式之前的所有内容？

Question

在 R 中，我有一个包含一列的数据框，其中每一行都有我想删除的与特定模式匹配的重复文本：

x <- c("DOI: 10.5256/f1000research.6541.r7660 The revised article answers most of my remarks and questions in a ... Continue reading The revised article answers most of my remarks and questions in a satisfactory way.", 
"DOI: 10.5256/f1000research.6601.r7701 The revision ... Continue reading The revision is approved I have read this", 
"DOI: 10.5256/f1000research.6599.r7859 I have read the revised article by Horrell and D'Orazio. They have responded appropriately to ... Continue reading I have read the revised article by Horrell and D'Orazio. They have responded appropriately to the concerns/questions raised")

我可以使用什么函数删除 ... Continue reading 或 Continue reading 之前的所有内容，包括 ... Continue reading 或 Continue reading？

Answer 1

使用子

包括继续阅读，

sub(".*Continue reading", "", x)

不包括继续阅读。

sub(".*(?=\bContinue reading)", "", x, perl=TRUE)

或

sub(".*\b(Continue reading)", "\1", x)

Answer 2

这应该删除 Continue reading

之前的所有内容

sub('.*\.{3}\s*(Continue reading.*)$', '\1', x)

如果需要删除... Continue reading

之前的字符

sub('.*(\.{3}\s*Continue reading.*)$', '\1', x)

在 R、gsub 和 Regex 前瞻和后视表达式中删除字符串模式之前的所有内容？

In R, gsub & Regex lookahead or lookbehind expression to remove everything BEFORE a string pattern?

regex

r

gsub

regex-lookarounds