在多个列中搜索多个搜索字符串时如何使用 str_detect within across

how to use str_detect within across when searching multiple columns for several search strings

我希望将我的功能迁移到新创建的 across

我使用 filter_at.

在多个列中搜索几个关键字的函数

但是,我正在努力使用 across 复制它,如下所示:

library(tidyverse)

raw_df <- tibble::tribble(
  ~cust_name, ~other_desc, ~trans, ~val,
     "Cisco",   "nothing",    "a", 100L,
    "bad_cs",     "cisCo",    "s", 101L,
       "Ibm",   "nothing",    "d", 102L,
    "bad_ib",       "ibM",    "f", 102L,
    "oraCle",    "Oracle",    "g", 103L,
      "mSft",   "nothing",    "k", 103L,
      "noth",      "Msft",    "j", 104L,
      "noth",    "oracle",    "l", 104L
  )


search_string = c("ibm", "cisco")


# Done using `filter_at`
raw_df %>% 
  filter_at(.vars = vars(cust_name, other_desc),
            .vars_predicate = any_vars(str_detect(., regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
            
  ) %>% unique()
  
  
# Not able to replicate result with `across`
raw_df %>% 
  filter(across(
    .cols = c(cust_name, other_desc), 
    .fns = ~str_detect(.), regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))



raw_df %>% 
  filter(str_detect,
         across(any_of(cust_name, other_desc),
         regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))

acrossReduce 合并到 select 行,其中出现了任何模式。

library(dplyr)
library(stringr)

pat <- paste(search_string, collapse = "|")

raw_df %>% 
  filter(Reduce(`|`, across(c(cust_name, other_desc), 
        ~str_detect(., regex(pat, ignore_case = TRUE)))))

但是,我认为在这里使用 if_any 更合适,因为它是为处理这种情况而构建的 -

raw_df %>%
  filter(if_any(c(cust_name, other_desc), 
                ~str_detect(., regex(pat, ignore_case = TRUE))))

# cust_name other_desc trans   val
#  <chr>     <chr>      <chr> <int>
#1 Cisco     nothing    a       100
#2 bad_cs    cisCo      s       101
#3 Ibm       nothing    d       102
#4 bad_ib    ibM        f       102

虽然罗纳克的解决方案是可以使用的:

这是一个带有额外技巧的替代方法。我认为这是 if_any 所做的: 使用 rowSums:

rowAny <- function(x) rowSums(x) > 0 


raw_df %>% 
    filter(rowAny(
        across(
            .cols = c(cust_name, other_desc),
            .fns = ~ str_detect(., regex("ibm|cisco", ignore_case = TRUE))
        )))

输出:

  cust_name other_desc trans   val
  <chr>     <chr>      <chr> <int>
1 Cisco     nothing    a       100
2 bad_cs    cisCo      s       101
3 Ibm       nothing    d       102
4 bad_ib    ibM        f       102