如果 R 中包含匹配模式,如何删除整个字符串
How do I remove entire strings if they contain a matched pattern in R
假设我有以下字符串 -
vector <- "this is a string of text containing stuff. something.com thisthat@co.uk and other stuff with something.anything"
我想删除包含 @
或 .
的字符串,所以我想删除 something.com
、thisthat@co.uk
和 something.anything
.我不想删除 stuff
,因为它是句子的结尾,不包含 .
。理想情况下,我希望能够使用 %>%
管道来执行此操作。
gsub(" ?\w+[.@]\S+", "", vector)
[1] "this is a string of text containing stuff. and other stuff with"
(更多 terse/simple)gsub
方法的替代方法:
gre <- gregexpr("[^ ]+[.@][^ ]+", vector)
regmatches(vector, gre)
# [[1]]
# [1] "something.com" "thisthat@co.uk" "something.anything"
regmatches(vector, gre) <- ""
vector
# [1] "this is a string of text containing stuff. and other stuff with "
这样的好处是可以任意替换。当然,我们只是在这里用 ""
替换它们,所以这有点矫枉过正,但是如果您需要以某种方式 更改 值(更改每个子字符串),那么这个是一个更强大的机制。
假设我有以下字符串 -
vector <- "this is a string of text containing stuff. something.com thisthat@co.uk and other stuff with something.anything"
我想删除包含 @
或 .
的字符串,所以我想删除 something.com
、thisthat@co.uk
和 something.anything
.我不想删除 stuff
,因为它是句子的结尾,不包含 .
。理想情况下,我希望能够使用 %>%
管道来执行此操作。
gsub(" ?\w+[.@]\S+", "", vector)
[1] "this is a string of text containing stuff. and other stuff with"
(更多 terse/simple)gsub
方法的替代方法:
gre <- gregexpr("[^ ]+[.@][^ ]+", vector)
regmatches(vector, gre)
# [[1]]
# [1] "something.com" "thisthat@co.uk" "something.anything"
regmatches(vector, gre) <- ""
vector
# [1] "this is a string of text containing stuff. and other stuff with "
这样的好处是可以任意替换。当然,我们只是在这里用 ""
替换它们,所以这有点矫枉过正,但是如果您需要以某种方式 更改 值(更改每个子字符串),那么这个是一个更强大的机制。