如何在 dplyr 的 n_distinct 函数中使用 "or" 条件?
How to use an "or" conditional in an n_distinct function, in dplyr?
假设我们从这个数据框开始:
mydat <-
data.frame(
ID = c(115,115,115,88,88,88,100,100),
Period = c(1, 2, 3, 1, 2, 3, 1, 2),
Status_1 = c(1,2,1,1,2,3,2,1),
Status_2 = c("Open","Open","Terminus","Open","Open","Closed","Open","Open")
)
> mydat
ID Period Status_1 Status_2
1 115 1 1 Open
2 115 2 2 Open
3 115 3 1 Terminus
4 88 1 1 Open
5 88 2 2 Open
6 88 3 3 Closed
7 100 1 2 Open
8 100 2 1 Open
接下来,我们运行下面的dplyr分组,按Period和Status_1对实例数求和,其中Status_2 = "Open":
mydat %>%
group_by(Period,Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 == "Open"]))
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 0
6 3 3 0
我一直在尝试扩展上面的 n_distinct()
函数,使其也包含 Status_2 = "Terminus"(除了上面代码中的“Open”)。我已经尝试了各种“或”条件的迭代,以及总结技巧,但还没有成功。任何想法如何做到这一点?
包括 Status_2 = "Terminus" 在内的结果如下所示:
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 1
6 3 3 0
这可能对你有用。我在选择中加了一个conditional/logicalor
mydat %>%
group_by(Period,Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 == "Open"|Status_2 == "Terminus"])) %>%
ungroup()
`summarise()` has grouped output by 'Period'. You can override using the `.groups` argument.
# A tibble: 6 x 3
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 1
6 3 3 0
df <-
data.frame(
ID = c(115,115,115,88,88,88,100,100),
Period = c(1, 2, 3, 1, 2, 3, 1, 2),
Status_1 = c(1,2,1,1,2,3,2,1),
Status_2 = c("Open","Open","Terminus","Open","Open","Closed","Open","Open")
)
library(tidyverse)
df %>%
group_by(Period, Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 %in% c("Terminus", "Open")]), .groups = "drop")
#> # A tibble: 6 x 3
#> Period Status_1 StatusCount
#> <dbl> <dbl> <int>
#> 1 1 1 2
#> 2 1 2 1
#> 3 2 1 1
#> 4 2 2 2
#> 5 3 1 1
#> 6 3 3 0
由 reprex package (v2.0.1)
创建于 2022-01-10
假设我们从这个数据框开始:
mydat <-
data.frame(
ID = c(115,115,115,88,88,88,100,100),
Period = c(1, 2, 3, 1, 2, 3, 1, 2),
Status_1 = c(1,2,1,1,2,3,2,1),
Status_2 = c("Open","Open","Terminus","Open","Open","Closed","Open","Open")
)
> mydat
ID Period Status_1 Status_2
1 115 1 1 Open
2 115 2 2 Open
3 115 3 1 Terminus
4 88 1 1 Open
5 88 2 2 Open
6 88 3 3 Closed
7 100 1 2 Open
8 100 2 1 Open
接下来,我们运行下面的dplyr分组,按Period和Status_1对实例数求和,其中Status_2 = "Open":
mydat %>%
group_by(Period,Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 == "Open"]))
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 0
6 3 3 0
我一直在尝试扩展上面的 n_distinct()
函数,使其也包含 Status_2 = "Terminus"(除了上面代码中的“Open”)。我已经尝试了各种“或”条件的迭代,以及总结技巧,但还没有成功。任何想法如何做到这一点?
包括 Status_2 = "Terminus" 在内的结果如下所示:
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 1
6 3 3 0
这可能对你有用。我在选择中加了一个conditional/logicalor
mydat %>%
group_by(Period,Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 == "Open"|Status_2 == "Terminus"])) %>%
ungroup()
`summarise()` has grouped output by 'Period'. You can override using the `.groups` argument.
# A tibble: 6 x 3
Period Status_1 StatusCount
<dbl> <dbl> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 2
5 3 1 1
6 3 3 0
df <-
data.frame(
ID = c(115,115,115,88,88,88,100,100),
Period = c(1, 2, 3, 1, 2, 3, 1, 2),
Status_1 = c(1,2,1,1,2,3,2,1),
Status_2 = c("Open","Open","Terminus","Open","Open","Closed","Open","Open")
)
library(tidyverse)
df %>%
group_by(Period, Status_1) %>%
summarize(StatusCount = n_distinct(ID[Status_2 %in% c("Terminus", "Open")]), .groups = "drop")
#> # A tibble: 6 x 3
#> Period Status_1 StatusCount
#> <dbl> <dbl> <int>
#> 1 1 1 2
#> 2 1 2 1
#> 3 2 1 1
#> 4 2 2 2
#> 5 3 1 1
#> 6 3 3 0
由 reprex package (v2.0.1)
创建于 2022-01-10