基于组的行长度对分组变量进行条件过滤
Conditional filtering on a grouped variable, based on row length of group
一个我无法理解的简单查询:
示例数据集:
ACH_DATE CODE
1 31OCT2018 A81001
2 31JAN2019 A81001
3 31OCT2018 A81002
4 31JAN2019 A81002
5 31OCT2018 A81003
6 31JAN2019 A81004
我想在 CODE
变量上 group_by
,并在 ACH_DATE
上进行过滤,如果一组有多个行,则删除 ACH_DATE == "31OCT2018"
所在的行.
示例数据:
df <- structure(list(ACH_DATE = c("31OCT2018", "31JAN2019", "31OCT2018",
"31JAN2019", "31OCT2018", "31JAN2019"), CODE = c("A81001", "A81001",
"A81002", "A81002", "A81003", "A81004")), row.names = c(NA, 6L
), class = "data.frame")
我们按 'CODE' 分组,创建一个包含行数和 'ACH_DATE'
的逻辑向量
library(dplyr)
df %>%
group_by(CODE) %>%
filter((n() > 1 & ACH_DATE != "31OCT2018") | n() == 1)
一个我无法理解的简单查询:
示例数据集:
ACH_DATE CODE
1 31OCT2018 A81001
2 31JAN2019 A81001
3 31OCT2018 A81002
4 31JAN2019 A81002
5 31OCT2018 A81003
6 31JAN2019 A81004
我想在 CODE
变量上 group_by
,并在 ACH_DATE
上进行过滤,如果一组有多个行,则删除 ACH_DATE == "31OCT2018"
所在的行.
示例数据:
df <- structure(list(ACH_DATE = c("31OCT2018", "31JAN2019", "31OCT2018",
"31JAN2019", "31OCT2018", "31JAN2019"), CODE = c("A81001", "A81001",
"A81002", "A81002", "A81003", "A81004")), row.names = c(NA, 6L
), class = "data.frame")
我们按 'CODE' 分组,创建一个包含行数和 'ACH_DATE'
的逻辑向量library(dplyr)
df %>%
group_by(CODE) %>%
filter((n() > 1 & ACH_DATE != "31OCT2018") | n() == 1)