data.table 中的子集组使用两列的条件

Subset groups in a data.table using conditions on two columns

我有一个 data.table 有很多组。我想根据多列的条件对整个组(而不仅仅是行)进行子集化。考虑以下 data.table:

DT <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), 
                 group = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"), 
                 y = c(14, 19, 16, 10, 6, 8, 14, 19, 10, 9, 6, 8), 
                 x = c(3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3)), 
            row.names = c(NA, -12L),
            class = c("data.table", "data.frame"))
>DT

    id group  y x
 1:  1     A 14 3
 2:  2     A 19 3
 3:  3     A 16 2
 4:  4     A 10 3
 5:  5     B  6 3
 6:  6     B  8 3
 7:  7     B 14 3
 8:  8     B 19 2
 9:  9     C 10 2
10: 10     C  9 3
11: 11     C  6 3
12: 12     C  8 3

我想将 y=6x=3 的组保留在同一行中。这样我就只有 class B 和 C(最好在 R 中使用 data.table 包):

    id group  y x
 1:  5     B  6 3
 2:  6     B  8 3
 3:  7     B 14 3
 4:  8     B 19 2
 5:  9     C 10 2
 6: 10     C  9 3
 7: 11     C  6 3
 8: 12     C  8 3

我所有的尝试只给了我那些包含 y=6x=3 的行,我不想要这些行:

    id group  y x
 1:  5     B  6 3
 2: 11     C  6 3

试试 dplyr 包

#select groups containing y and x
groups = DT %>% filter(y == 6, x == 3) %>% select(group) %>% unique() %>% unlist() %>% as.vector()   

# filter for selected groups
DT %>% filter(group %in% groups)

data.table:

DT[,.SD[any(x == 3 & y == 6)], by=group]

    group    id     y     x
   <char> <int> <num> <num>
1:      B     5     6     3
2:      B     6     8     3
3:      B     7    14     3
4:      B     8    19     2
5:      C     9    10     2
6:      C    10     9     3
7:      C    11     6     3
8:      C    12     8     3

另一个可能更快的选项:

DT[, if (any(x == 3 & y == 6)) .SD, by=group]

一个data.table选项

> DT[group %in% DT[.(3, 6), group, on = .(x, y)]]
   id group  y x
1:  5     B  6 3
2:  6     B  8 3
3:  7     B 14 3
4:  8     B 19 2
5:  9     C 10 2
6: 10     C  9 3
7: 11     C  6 3
8: 12     C  8 3