在多个列中搜索多个搜索字符串时如何使用 str_detect within across
how to use str_detect within across when searching multiple columns for several search strings
我希望将我的功能迁移到新创建的 across
。
我使用 filter_at
.
在多个列中搜索几个关键字的函数
但是,我正在努力使用 across
复制它,如下所示:
library(tidyverse)
raw_df <- tibble::tribble(
~cust_name, ~other_desc, ~trans, ~val,
"Cisco", "nothing", "a", 100L,
"bad_cs", "cisCo", "s", 101L,
"Ibm", "nothing", "d", 102L,
"bad_ib", "ibM", "f", 102L,
"oraCle", "Oracle", "g", 103L,
"mSft", "nothing", "k", 103L,
"noth", "Msft", "j", 104L,
"noth", "oracle", "l", 104L
)
search_string = c("ibm", "cisco")
# Done using `filter_at`
raw_df %>%
filter_at(.vars = vars(cust_name, other_desc),
.vars_predicate = any_vars(str_detect(., regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
) %>% unique()
# Not able to replicate result with `across`
raw_df %>%
filter(across(
.cols = c(cust_name, other_desc),
.fns = ~str_detect(.), regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
raw_df %>%
filter(str_detect,
across(any_of(cust_name, other_desc),
regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
将 across
与 Reduce
合并到 select 行,其中出现了任何模式。
library(dplyr)
library(stringr)
pat <- paste(search_string, collapse = "|")
raw_df %>%
filter(Reduce(`|`, across(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE)))))
但是,我认为在这里使用 if_any
更合适,因为它是为处理这种情况而构建的 -
raw_df %>%
filter(if_any(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE))))
# cust_name other_desc trans val
# <chr> <chr> <chr> <int>
#1 Cisco nothing a 100
#2 bad_cs cisCo s 101
#3 Ibm nothing d 102
#4 bad_ib ibM f 102
虽然罗纳克的解决方案是可以使用的:
这是一个带有额外技巧的替代方法。我认为这是 if_any
所做的:
使用 rowSums
:
rowAny <- function(x) rowSums(x) > 0
raw_df %>%
filter(rowAny(
across(
.cols = c(cust_name, other_desc),
.fns = ~ str_detect(., regex("ibm|cisco", ignore_case = TRUE))
)))
输出:
cust_name other_desc trans val
<chr> <chr> <chr> <int>
1 Cisco nothing a 100
2 bad_cs cisCo s 101
3 Ibm nothing d 102
4 bad_ib ibM f 102
我希望将我的功能迁移到新创建的 across
。
我使用 filter_at
.
但是,我正在努力使用 across
复制它,如下所示:
library(tidyverse)
raw_df <- tibble::tribble(
~cust_name, ~other_desc, ~trans, ~val,
"Cisco", "nothing", "a", 100L,
"bad_cs", "cisCo", "s", 101L,
"Ibm", "nothing", "d", 102L,
"bad_ib", "ibM", "f", 102L,
"oraCle", "Oracle", "g", 103L,
"mSft", "nothing", "k", 103L,
"noth", "Msft", "j", 104L,
"noth", "oracle", "l", 104L
)
search_string = c("ibm", "cisco")
# Done using `filter_at`
raw_df %>%
filter_at(.vars = vars(cust_name, other_desc),
.vars_predicate = any_vars(str_detect(., regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
) %>% unique()
# Not able to replicate result with `across`
raw_df %>%
filter(across(
.cols = c(cust_name, other_desc),
.fns = ~str_detect(.), regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
raw_df %>%
filter(str_detect,
across(any_of(cust_name, other_desc),
regex(paste(search_string, collapse = "|"), ignore_case = TRUE)))
将 across
与 Reduce
合并到 select 行,其中出现了任何模式。
library(dplyr)
library(stringr)
pat <- paste(search_string, collapse = "|")
raw_df %>%
filter(Reduce(`|`, across(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE)))))
但是,我认为在这里使用 if_any
更合适,因为它是为处理这种情况而构建的 -
raw_df %>%
filter(if_any(c(cust_name, other_desc),
~str_detect(., regex(pat, ignore_case = TRUE))))
# cust_name other_desc trans val
# <chr> <chr> <chr> <int>
#1 Cisco nothing a 100
#2 bad_cs cisCo s 101
#3 Ibm nothing d 102
#4 bad_ib ibM f 102
虽然罗纳克的解决方案是可以使用的:
这是一个带有额外技巧的替代方法。我认为这是 if_any
所做的:
使用 rowSums
:
rowAny <- function(x) rowSums(x) > 0
raw_df %>%
filter(rowAny(
across(
.cols = c(cust_name, other_desc),
.fns = ~ str_detect(., regex("ibm|cisco", ignore_case = TRUE))
)))
输出:
cust_name other_desc trans val
<chr> <chr> <chr> <int>
1 Cisco nothing a 100
2 bad_cs cisCo s 101
3 Ibm nothing d 102
4 bad_ib ibM f 102