如何在 R 中使用正则表达式删除以 * 开头的行

Question

我目前在 R 中使用正则表达式来删除以 * 开头的行，例如：

* Wikipedia started from the public domain version reprinted by the [http://www.ccel.org/ Christian Classics Ethereal Library].  
* James William Richard (1898). From [[Internet Archive]].
* [http://www.melanchthon.de/e/ The Phillip Melanchthon Quinquennial]

我尝试使用函数 gsub 和正则表达式，例如：

gsub("^[\*]+[\s\[A-Za-z,;'\"\s]+[.?!\]]$","",tex1)

但是什么也没发生。你能帮我找出这个表达式的问题吗？

Answer 1

如果你有一个字符串向量 v，最好搜索向量中的一个元素是否包含你正在寻找的主题，所以，你需要 grepl 而不是gsub.

你可以这样做：

v <- c("hello", "*hi", "world")
v[!grepl("^\*", v)] # looks for the elements that begins with * and negates the result
#[1] "hello" "world"

Answer 2

删除以*

开头的行

sub("(?m)^\*.*\n?", "", x, perl=T)

如何在 R 中使用正则表达式删除以 * 开头的行

How to delete a line which starts with a * using regex in R

regex

r

gsub