R:匹配字符串后删除文本结尾
R: Remove end of text after matching string
我想删除在某个字符匹配 THE END
或 FINIS
之后出现的任何文本。我知道这与其他 topic 非常相似,但我对正则表达式的熟练程度不足以让我完成这项工作。
我的文字是古腾堡计划中的莎士比亚书籍。它们通常看起来像
txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder,
by your leave, she will be tam'd so. Exeunt THE END <<THIS ELECTRONIC VERSION OF THE
COMPLETE WORKS OF WILLIAM ..."
或
txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder,
by your leave, she will be tam'd so. Exeunt FINIS <<THIS ELECTRONIC VERSION OF THE
COMPLETE WORKS OF WILLIAM ..."
我的理想看起来像 gsub("^[THE END]*|^[FINIS]*", "", txt)
返回 "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt
你已经很接近了,你必须使用:
gsub("(THE END|FINIS).*", "", txt)
顺便说一句,正如 thelatemail 在他的评论中指出的那样 sub
就足以替换一个。
我想删除在某个字符匹配 THE END
或 FINIS
之后出现的任何文本。我知道这与其他 topic 非常相似,但我对正则表达式的熟练程度不足以让我完成这项工作。
我的文字是古腾堡计划中的莎士比亚书籍。它们通常看起来像
txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder,
by your leave, she will be tam'd so. Exeunt THE END <<THIS ELECTRONIC VERSION OF THE
COMPLETE WORKS OF WILLIAM ..."
或
txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder,
by your leave, she will be tam'd so. Exeunt FINIS <<THIS ELECTRONIC VERSION OF THE
COMPLETE WORKS OF WILLIAM ..."
我的理想看起来像 gsub("^[THE END]*|^[FINIS]*", "", txt)
返回 "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt
你已经很接近了,你必须使用:
gsub("(THE END|FINIS).*", "", txt)
顺便说一句,正如 thelatemail 在他的评论中指出的那样 sub
就足以替换一个。