将 any()、all() 等与 dplyr::filter() + dplyr::across() 组合一起使用的正确方法是什么?
What is the correct way to use any(), all(), etc., with the dplyr::filter() + dplyr::across() combination?
假设我有以下 data.frame
df
:
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 10 3
# 2 WHAT_hello WHAT_hello WHAT_hello 1 2
# 3 Hello Hello Hello 9 1
我想处理 data.frame
以仅保留在 col1
、col2
或 [= 中的至少一个中包含前缀 WHAT_
的那些行20=].
现在我知道我可以使用 |
轻松完成此操作,但我试图使用 dplyr::across
和 tidyselect::matches
以及 base::any
和 [= 来实现此目的25=] 指向右列的 dplyr::filter
。但这似乎不起作用,即使与 dplyr::rowwise
.
一起使用也是如此
那么正确的处理方法是什么?我做错了什么?
我想使用 across
+ any
主要是因为我可能不一定事先知道实际数据集中有多少列。
下面是我的示例(数据+代码):
#Libraries.
library(base)
library(dplyr)
library(tidyselect)
library(stringr)
library(magrittr)
#Toy data.
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
#Works.
df %>%
filter(str_detect(col1, "^WHAT_") | str_detect(col2, "^WHAT_") | str_detect(col3, "^WHAT_"))
#Output.
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 1 2
# 2 WHAT_hello WHAT_hello WHAT_hello 5 4
#Works (incorrectly).
df %>%
filter(
across(.cols = matches("^col"),
.fns = ~ any(str_detect(.x, "^WHAT")) )
)
#Output.
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 1 2
# 2 WHAT_hello WHAT_hello WHAT_hello 5 4
# 3 Hello Hello Hello 4 7
#Works (incorrectly) also.
df %>%
rowwise() %>%
filter(
across(.cols = matches("^col"),
.fns = ~ any(str_detect(.x, "^WHAT")) )
)
#Output.
# col1 col2 col3 othercol1 othercol11
# <chr> <chr> <chr> <int> <int>
# 1 WHAT_hello WHAT_hello WHAT_hello 5 4
对于应用于行而不是列的函数,您可以使用 c_across
和 rowwise
:
df %>%
rowwise() %>%
filter(any(str_detect(c_across(matches('^col')), '^WHAT')))
# # A tibble: 2 x 5
# # Rowwise:
# col1 col2 col3 othercol1 othercol11
# <chr> <chr> <chr> <int> <int>
# 1 Hello WHAT_hello2 Hello 9 7
# 2 WHAT_hello WHAT_hello WHAT_hello 3 10
或者,使用 across
和 rowSums
:
row_lgl <-
df %>%
transmute(across(.cols = matches("^col"), .fns = ~ str_detect(.x, "^WHAT"))) %>%
rowSums %>%
'>'(0)
df %>%
filter(row_lgl)
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 9 7
# 2 WHAT_hello WHAT_hello WHAT_hello 3 10
使用base
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
df
#> col1 col2 col3 othercol1 othercol11
#> 1 Hello WHAT_hello2 Hello 1 9
#> 2 WHAT_hello WHAT_hello WHAT_hello 3 2
#> 3 Hello Hello Hello 4 8
df[apply(df, 1, function(x) sum(grepl(pattern = "^WHAT_", x = x))) != 0, ]
#> col1 col2 col3 othercol1 othercol11
#> 1 Hello WHAT_hello2 Hello 1 9
#> 2 WHAT_hello WHAT_hello WHAT_hello 3 2
由 reprex package (v0.3.0)
于 2021 年 1 月 20 日创建
使用tidyverse
library(tidyverse)
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
df %>%
filter(rowSums(across(.cols = where(is.character), .fns = ~ str_detect(.x, "^WHAT"))) != 0)
#> col1 col2 col3 othercol1 othercol11
#> 1 Hello WHAT_hello2 Hello 1 3
#> 2 WHAT_hello WHAT_hello WHAT_hello 7 4
由 reprex package (v0.3.0)
于 2021 年 1 月 20 日创建
假设我有以下 data.frame
df
:
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 10 3
# 2 WHAT_hello WHAT_hello WHAT_hello 1 2
# 3 Hello Hello Hello 9 1
我想处理 data.frame
以仅保留在 col1
、col2
或 [= 中的至少一个中包含前缀 WHAT_
的那些行20=].
现在我知道我可以使用 |
轻松完成此操作,但我试图使用 dplyr::across
和 tidyselect::matches
以及 base::any
和 [= 来实现此目的25=] 指向右列的 dplyr::filter
。但这似乎不起作用,即使与 dplyr::rowwise
.
那么正确的处理方法是什么?我做错了什么?
我想使用 across
+ any
主要是因为我可能不一定事先知道实际数据集中有多少列。
下面是我的示例(数据+代码):
#Libraries.
library(base)
library(dplyr)
library(tidyselect)
library(stringr)
library(magrittr)
#Toy data.
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
#Works.
df %>%
filter(str_detect(col1, "^WHAT_") | str_detect(col2, "^WHAT_") | str_detect(col3, "^WHAT_"))
#Output.
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 1 2
# 2 WHAT_hello WHAT_hello WHAT_hello 5 4
#Works (incorrectly).
df %>%
filter(
across(.cols = matches("^col"),
.fns = ~ any(str_detect(.x, "^WHAT")) )
)
#Output.
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 1 2
# 2 WHAT_hello WHAT_hello WHAT_hello 5 4
# 3 Hello Hello Hello 4 7
#Works (incorrectly) also.
df %>%
rowwise() %>%
filter(
across(.cols = matches("^col"),
.fns = ~ any(str_detect(.x, "^WHAT")) )
)
#Output.
# col1 col2 col3 othercol1 othercol11
# <chr> <chr> <chr> <int> <int>
# 1 WHAT_hello WHAT_hello WHAT_hello 5 4
对于应用于行而不是列的函数,您可以使用 c_across
和 rowwise
:
df %>%
rowwise() %>%
filter(any(str_detect(c_across(matches('^col')), '^WHAT')))
# # A tibble: 2 x 5
# # Rowwise:
# col1 col2 col3 othercol1 othercol11
# <chr> <chr> <chr> <int> <int>
# 1 Hello WHAT_hello2 Hello 9 7
# 2 WHAT_hello WHAT_hello WHAT_hello 3 10
或者,使用 across
和 rowSums
:
row_lgl <-
df %>%
transmute(across(.cols = matches("^col"), .fns = ~ str_detect(.x, "^WHAT"))) %>%
rowSums %>%
'>'(0)
df %>%
filter(row_lgl)
# col1 col2 col3 othercol1 othercol11
# 1 Hello WHAT_hello2 Hello 9 7
# 2 WHAT_hello WHAT_hello WHAT_hello 3 10
使用base
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
df
#> col1 col2 col3 othercol1 othercol11
#> 1 Hello WHAT_hello2 Hello 1 9
#> 2 WHAT_hello WHAT_hello WHAT_hello 3 2
#> 3 Hello Hello Hello 4 8
df[apply(df, 1, function(x) sum(grepl(pattern = "^WHAT_", x = x))) != 0, ]
#> col1 col2 col3 othercol1 othercol11
#> 1 Hello WHAT_hello2 Hello 1 9
#> 2 WHAT_hello WHAT_hello WHAT_hello 3 2
由 reprex package (v0.3.0)
于 2021 年 1 月 20 日创建使用tidyverse
library(tidyverse)
df <- data.frame(col1 = c("Hello", "WHAT_hello", "Hello"),
col2 = c("WHAT_hello2", "WHAT_hello", "Hello"),
col3 = c("Hello", "WHAT_hello", "Hello"),
othercol1 = sample(1:10, 3),
othercol11 = sample(1:10, 3),
stringsAsFactors = FALSE)
df %>%
filter(rowSums(across(.cols = where(is.character), .fns = ~ str_detect(.x, "^WHAT"))) != 0)
#> col1 col2 col3 othercol1 othercol11
#> 1 Hello WHAT_hello2 Hello 1 3
#> 2 WHAT_hello WHAT_hello WHAT_hello 7 4
由 reprex package (v0.3.0)
于 2021 年 1 月 20 日创建