Select 特定年份可用的那些组
Select those groups which are available for certain years
我有一个 data.table
如下:-
datazzz <- data.table(group = c(rep("a", times = 3),
rep("b", times = 4),
rep("c", times = 4),
rep("k", times = 2),
rep("f", times = 4)),
year = c(2017:2019, 2016:2019, 2016:2019, 2018, 2019,
2017:2020),
values = runif(17))
datazzz
group year values
1: a 2017 0.14710475
2: a 2018 0.23493958
3: a 2019 0.97570157
4: b 2016 0.82078366
5: b 2017 0.92685531
6: b 2018 0.64406726
7: b 2019 0.17611851
8: c 2016 0.96894329
9: c 2017 0.97501190
10: c 2018 0.49732578
11: c 2019 0.90125133
12: k 2018 0.14836372
13: k 2019 0.01368339
14: f 2017 0.84735620
15: f 2018 0.71688780
16: f 2019 0.62894310
17: f 2020 0.73526859
我只想 select 那些从 2016 年到 2019 年拥有 year
的组。因此,我得到的 data.table
看起来像
group year values
1: b 2016 0.82078366
2: b 2017 0.92685531
3: b 2018 0.64406726
4: b 2019 0.17611851
5: c 2016 0.96894329
6: c 2017 0.97501190
7: c 2018 0.49732578
8: c 2019 0.90125133
子集条件是所有年份都出现在组中。我们可以通过 .I
.
传递行索引来构建具有该条件的变量 V1
和基于该条件的 select 行
datazzz[datazzz[, .I[all(2016:2019 %in% unique(year))], by = .(group)]$V1]
group year values
1: b 2016 0.86527048
2: b 2017 0.46478348
3: b 2018 0.94761731
4: b 2019 0.05005278
5: c 2016 0.73977484
6: c 2017 0.23698556
7: c 2018 0.29560906
8: c 2019 0.61450736
我们可以做到:
library(data.table)
setDT(datazzz)[, if(min(year) == 2016 & max(year)==2019) .SD, by = group]
group year values
1: b 2016 0.2321175
2: b 2017 0.2776979
3: b 2018 0.5695105
4: b 2019 0.7224908
5: c 2016 0.1904413
6: c 2017 0.4608467
7: c 2018 0.8258316
8: c 2019 0.7198854
我有一个 data.table
如下:-
datazzz <- data.table(group = c(rep("a", times = 3),
rep("b", times = 4),
rep("c", times = 4),
rep("k", times = 2),
rep("f", times = 4)),
year = c(2017:2019, 2016:2019, 2016:2019, 2018, 2019,
2017:2020),
values = runif(17))
datazzz
group year values
1: a 2017 0.14710475
2: a 2018 0.23493958
3: a 2019 0.97570157
4: b 2016 0.82078366
5: b 2017 0.92685531
6: b 2018 0.64406726
7: b 2019 0.17611851
8: c 2016 0.96894329
9: c 2017 0.97501190
10: c 2018 0.49732578
11: c 2019 0.90125133
12: k 2018 0.14836372
13: k 2019 0.01368339
14: f 2017 0.84735620
15: f 2018 0.71688780
16: f 2019 0.62894310
17: f 2020 0.73526859
我只想 select 那些从 2016 年到 2019 年拥有 year
的组。因此,我得到的 data.table
看起来像
group year values
1: b 2016 0.82078366
2: b 2017 0.92685531
3: b 2018 0.64406726
4: b 2019 0.17611851
5: c 2016 0.96894329
6: c 2017 0.97501190
7: c 2018 0.49732578
8: c 2019 0.90125133
子集条件是所有年份都出现在组中。我们可以通过 .I
.
V1
和基于该条件的 select 行
datazzz[datazzz[, .I[all(2016:2019 %in% unique(year))], by = .(group)]$V1]
group year values
1: b 2016 0.86527048
2: b 2017 0.46478348
3: b 2018 0.94761731
4: b 2019 0.05005278
5: c 2016 0.73977484
6: c 2017 0.23698556
7: c 2018 0.29560906
8: c 2019 0.61450736
我们可以做到:
library(data.table)
setDT(datazzz)[, if(min(year) == 2016 & max(year)==2019) .SD, by = group]
group year values
1: b 2016 0.2321175
2: b 2017 0.2776979
3: b 2018 0.5695105
4: b 2019 0.7224908
5: c 2016 0.1904413
6: c 2017 0.4608467
7: c 2018 0.8258316
8: c 2019 0.7198854