R across 仅查找正值或仅查找负值 tidyverse
R across find only positive or only negative values tidyverse
在 dplyr Column-wise operations 中有这个例子:
df <- tibble(x = c("a", "b"), y = c(1, 1), z = c(-1, 1))
# Find all rows where EVERY numeric variable is greater than zero
df %>% filter(across(where(is.numeric), ~ .x > 0))
#> # A tibble: 1 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 b 1 1
如果我们稍微改变一下标题:
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
并且我们想要为两列获取负值或正值,我们需要为列命名:
df %>% filter((y > 0 & z > 0) | (y < 0 & z < 0))
#> # A tibble: 2 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 b 1 1
#> 2 c -1 -1
with across()
这怎么能做到?
df %>% filter(across(where(is.numeric), ~ .x > 0 | .x < 0))
#> # A tibble: 3 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 a 1 -1
#> 2 b 1 1
#> 3 c -1 -1
我认为由于您在逐行操作中处理 2 个变量,因此使用 purrr
包中的 map2
会容易得多:
library(dplyr)
library(purrr)
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
df %>%
filter(map2_lgl(y, z, ~ (.x > 0 & .y > 0) | (.x < 0 & .y < 0)))
# A tibble: 2 x 3
x y z
<chr> <dbl> <dbl>
1 b 1 1
2 c -1 -1
我们必须从一组条件语句中检查所有 TRUE
或所有 FALSE
,例如 c(T, T)
、c(T, F)
和 c(F, F)
。现在-
if_all
将过滤 c(T, T)
!if_any
将从 !
中再次过滤 c(T, T)
即否定剩余值
- 这两个将由
|
连接,即 OR
- 因此,我们将只有
c(T, T)
& c(F, F)
这样就可以了
df %>% filter(if_all(where(is.numeric), ~ .x > 0) | !if_any(where(is.numeric), ~ .x < 0))
# A tibble: 2 x 3
x y z
<chr> <dbl> <dbl>
1 b 1 1
2 c -1 -1
备选
df %>% filter(if_all(where(is.numeric), ~ .x > 0) | across(where(is.numeric), ~ .x < 0))
# A tibble: 2 x 3
x y z
<chr> <dbl> <dbl>
1 b 1 1
2 c -1 -1
让我们看看更大的例子
set.seed(201)
df <- data.frame(A = LETTERS[1:10], x = rnorm(10), y = rnorm(10), z = -1*rnorm(10))
> df
A x y z
1 A 0.28606069 0.69329617 0.24400084
2 B -0.34454603 0.22380936 0.98825314
3 C 0.32576373 0.39845694 -1.24206048
4 D -1.69658097 1.01347438 1.68266603
5 E -1.28548252 -0.64785307 -1.44289063
6 F -0.07503189 0.64845271 0.46543975
7 G 0.26693735 0.20734270 -0.69366150
8 H 0.05593404 0.06439014 0.08772557
9 I -2.30403431 0.66938092 0.95508038
10 J 0.18900414 -0.37425445 -0.17010088
> df %>% filter(if_all(where(is.numeric), ~ .x > 0) | !if_any(where(is.numeric), ~ .x < 0))
A x y z
1 A 0.28606069 0.69329617 0.24400084
2 E -1.28548252 -0.64785307 -1.44289063
3 H 0.05593404 0.06439014 0.08772557
在 dplyr Column-wise operations 中有这个例子:
df <- tibble(x = c("a", "b"), y = c(1, 1), z = c(-1, 1))
# Find all rows where EVERY numeric variable is greater than zero
df %>% filter(across(where(is.numeric), ~ .x > 0))
#> # A tibble: 1 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 b 1 1
如果我们稍微改变一下标题:
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
并且我们想要为两列获取负值或正值,我们需要为列命名:
df %>% filter((y > 0 & z > 0) | (y < 0 & z < 0))
#> # A tibble: 2 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 b 1 1
#> 2 c -1 -1
with across()
这怎么能做到?
df %>% filter(across(where(is.numeric), ~ .x > 0 | .x < 0))
#> # A tibble: 3 x 3
#> x y z
#> <chr> <dbl> <dbl>
#> 1 a 1 -1
#> 2 b 1 1
#> 3 c -1 -1
我认为由于您在逐行操作中处理 2 个变量,因此使用 purrr
包中的 map2
会容易得多:
library(dplyr)
library(purrr)
df <- tibble(x = c("a", "b", "c"), y = c(1, 1, -1), z = c(-1, 1, -1))
df %>%
filter(map2_lgl(y, z, ~ (.x > 0 & .y > 0) | (.x < 0 & .y < 0)))
# A tibble: 2 x 3
x y z
<chr> <dbl> <dbl>
1 b 1 1
2 c -1 -1
我们必须从一组条件语句中检查所有 TRUE
或所有 FALSE
,例如 c(T, T)
、c(T, F)
和 c(F, F)
。现在-
if_all
将过滤c(T, T)
!if_any
将从!
中再次过滤c(T, T)
即否定剩余值- 这两个将由
|
连接,即 OR - 因此,我们将只有
c(T, T)
&c(F, F)
这样就可以了
df %>% filter(if_all(where(is.numeric), ~ .x > 0) | !if_any(where(is.numeric), ~ .x < 0))
# A tibble: 2 x 3
x y z
<chr> <dbl> <dbl>
1 b 1 1
2 c -1 -1
备选
df %>% filter(if_all(where(is.numeric), ~ .x > 0) | across(where(is.numeric), ~ .x < 0))
# A tibble: 2 x 3
x y z
<chr> <dbl> <dbl>
1 b 1 1
2 c -1 -1
让我们看看更大的例子
set.seed(201)
df <- data.frame(A = LETTERS[1:10], x = rnorm(10), y = rnorm(10), z = -1*rnorm(10))
> df
A x y z
1 A 0.28606069 0.69329617 0.24400084
2 B -0.34454603 0.22380936 0.98825314
3 C 0.32576373 0.39845694 -1.24206048
4 D -1.69658097 1.01347438 1.68266603
5 E -1.28548252 -0.64785307 -1.44289063
6 F -0.07503189 0.64845271 0.46543975
7 G 0.26693735 0.20734270 -0.69366150
8 H 0.05593404 0.06439014 0.08772557
9 I -2.30403431 0.66938092 0.95508038
10 J 0.18900414 -0.37425445 -0.17010088
> df %>% filter(if_all(where(is.numeric), ~ .x > 0) | !if_any(where(is.numeric), ~ .x < 0))
A x y z
1 A 0.28606069 0.69329617 0.24400084
2 E -1.28548252 -0.64785307 -1.44289063
3 H 0.05593404 0.06439014 0.08772557