对多个文本使用 grepl

Question

假设我有以下变量：

a <- c('one','two','three')
b <- c('one|on','two|wo',"three|thre")
c <- c('there is one','there one is ','there is one three two')

我想要一个具有以下结果的新变量：

 d
 [1] "one"   "one"   "three"

我想做的是查找文本中是否包含 one 或 on 一词，然后将新值 one 分配给新变量 d。此外，如果 a 中有多个值，层次结构应从最后一个值开始。

我能做的是：

d <- list()
d[grepl(b[1],c)] <- a[1]
d[grepl(b[2],c)] <- a[2]
d[grepl(b[3],c)] <- a[3]
d <- unlist(d)

同样可以在一个简单的循环中完成。但是有没有其他更优雅的方式呢？

Answer 1

它不是那么优雅，但是这个函数可以满足你的要求：

funny_replace <- function(c, b, a) {

   max_or_null <- function(x) {
      if (length(x) != 0) max(x) else NULL
   }

   multi_grep <- function(b, x) {
      which(sapply(b, grepl, x))
   }

   replace_one <- function(s, b, a) {
      a[max_or_null(multi_grep(b, s))]
   }

   unlist(sapply(c, replace_one, b, a)) 
}
funny_replace(c, b, a)
#      there is one          there one is  there is one three two 
#             "one"                  "one"                "three"

它的工作原理如下：max_or_null 用于 return 向量的最大值或 NULL，如果向量为空。稍后使用它来确保 c 的元素得到正确处理，其中 b 中没有匹配的模式。

multi_grep 在单个字符串中搜索多个模式（通常的 grep 做相反的事情：在多个字符串中搜索一个模式）并且 returns 是找到的模式的索引。

replace_one 获取单个字符串并检查 b 中的哪些模式是使用 multi_grep 找到的。然后它使用 max_or_null 到 return 这些索引中最大的一个，如果没有匹配则使用 NULL。最后，具有该索引的元素是从 a.

中挑选出来的

replace_one 然后应用于 c 的每个元素以获得所需的结果。

我认为，这是比您的或 for 循环更实用的解决方案，因为它避免了重复赋值。另一方面，它似乎有点复杂。

顺便说一下：我在各处都使用了 a、b 和 c，以便更轻松地将我的代码与您的示例相匹配。但是，这不是好的做法。

对多个文本使用 grepl

Using grepl for multipe texts

r

grepl