gsub 中的正则表达式无效

invalid regular expression in gsub

为什么电子邮件 regexerrorinvalid regular expression '^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$', reason 'Invalid character range'

blogs.smpl <- "mail:mami@yahoo.com: subject:Lorem Ipsum body:   is simply dummy text of the printing and typesetting industry. 
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"

blogs.smpl <- gsub("^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$","",blogs.smpl)

原因是这一段:

[a-zA-Z0-9-.]

试着像这样把破折号放在最后:

[a-zA-Z0-9.-]

因为-只应该在字符class的开头或结尾。否则,它表示它之前和之后的符号之间的范围。

最后一个字符 class 有误:[a-zA-Z0-9-.]。必须转成[a-zA-Z0-9.-].

注意:在 R 中,您不能转义字符 class 内的连字符以匹配文字连字符,除非您使用 perl=TRUE.

此外,请参阅 R String Manipulation PDF 了解有关 R 字符 classes(第 2 页)和一般正则表达式的更多信息。以下是摘录:

Here is a set of rules on how to match characters as regular characters inside a character class: To match ] inside a character class put it first.

To match - inside a character class put it first or last.

To match ^ inside a character class put it anywhere, but first.

To match any other character or metacharacter (but \) inside a character class put it anywhere.