Select 特定年份可用的那些组

Select those groups which are available for certain years

我有一个 data.table 如下:-

datazzz <- data.table(group = c(rep("a", times = 3),
                                rep("b", times = 4),
                                rep("c", times = 4),
                                rep("k", times = 2),
                                rep("f", times = 4)),
                      year = c(2017:2019, 2016:2019, 2016:2019, 2018, 2019,
                               2017:2020),
                      values = runif(17))

datazzz

    group year     values
 1:     a 2017 0.14710475
 2:     a 2018 0.23493958
 3:     a 2019 0.97570157
 4:     b 2016 0.82078366
 5:     b 2017 0.92685531
 6:     b 2018 0.64406726
 7:     b 2019 0.17611851
 8:     c 2016 0.96894329
 9:     c 2017 0.97501190
10:     c 2018 0.49732578
11:     c 2019 0.90125133
12:     k 2018 0.14836372
13:     k 2019 0.01368339
14:     f 2017 0.84735620
15:     f 2018 0.71688780
16:     f 2019 0.62894310
17:     f 2020 0.73526859

我只想 select 那些从 2016 年到 2019 年拥有 year 的组。因此,我得到的 data.table 看起来像

    group year     values
 1:     b 2016 0.82078366
 2:     b 2017 0.92685531
 3:     b 2018 0.64406726
 4:     b 2019 0.17611851
 5:     c 2016 0.96894329
 6:     c 2017 0.97501190
 7:     c 2018 0.49732578
 8:     c 2019 0.90125133

子集条件是所有年份都出现在组中。我们可以通过 .I.

传递行索引来构建具有该条件的变量 V1 和基于该条件的 select 行
datazzz[datazzz[, .I[all(2016:2019 %in% unique(year))], by = .(group)]$V1]

   group year     values
1:     b 2016 0.86527048
2:     b 2017 0.46478348
3:     b 2018 0.94761731
4:     b 2019 0.05005278
5:     c 2016 0.73977484
6:     c 2017 0.23698556
7:     c 2018 0.29560906
8:     c 2019 0.61450736

我们可以做到:

library(data.table)
setDT(datazzz)[, if(min(year) == 2016 & max(year)==2019) .SD, by = group]


   group year    values
1:     b 2016 0.2321175
2:     b 2017 0.2776979
3:     b 2018 0.5695105
4:     b 2019 0.7224908
5:     c 2016 0.1904413
6:     c 2017 0.4608467
7:     c 2018 0.8258316
8:     c 2019 0.7198854