R:匹配字符串后删除文本结尾

R: Remove end of text after matching string

我想删除在某个字符匹配 THE ENDFINIS 之后出现的任何文本。我知道这与其他 topic 非常相似,但我对正则表达式的熟练程度不足以让我完成这项工作。

我的文字是古腾堡计划中的莎士比亚书籍。它们通常看起来像

txt <- "... thou hast tam'd a curst shrow.   LUCENTIO. 'Tis a wonder, 
  by your leave, she will be tam'd so. Exeunt  THE END   <<THIS ELECTRONIC  VERSION OF THE 
  COMPLETE WORKS OF WILLIAM ..."

txt <- "... thou hast tam'd a curst shrow.   LUCENTIO. 'Tis a wonder, 
  by your leave, she will be tam'd so. Exeunt  FINIS  <<THIS ELECTRONIC  VERSION OF THE 
  COMPLETE WORKS OF WILLIAM ..."

我的理想看起来像 gsub("^[THE END]*|^[FINIS]*", "", txt) 返回 "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt

你已经很接近了,你必须使用:

gsub("(THE END|FINIS).*", "", txt)

Working demo

顺便说一句,正如 thelatemail 在他的评论中指出的那样 sub 就足以替换一个。