如何过滤匹配多个值的组?

how to filter groups matching more than one values?

 id drug_name     med_start  med_end   
 <dbl> <chr>         <date>     <date>    
   1 pembrolizumab 2018-02-07 2018-02-07
   1 pembrolizumab 2018-02-28 2018-02-28
   2 pembrolizumab 2018-01-05 2018-01-05
   2 nivolumab     2018-09-20 2018-09-20
   2 nivolumab     2018-10-03 2018-10-03
   2 nivolumab     2018-11-01 2018-11-01
  1. 我正在尝试获取在 drug_name 中同时具有 pembrolizumabnivolumab 的 ID。我可以在 id 上做 group_by 吗?然后用这两个条件过滤? 对于上面的 table,id 2 有两个 drug_names。我可能会遇到过滤超过 2 drug_names.

    的情况
  2. 我也在尝试查看两个 med_start 之间的间隔是否大于 x 天。比方说30天。基本上过滤在 med_start.

    之间间隔 30 天的 ID

以上数据的代码如下

data  <- structure(list(id = structure(c(1, 1, 2, 2, 2, 2), class = "int"), 
    drug_name = c("pembrolizumab", "pembrolizumab", "pembrolizumab", 
    "nivolumab", "nivolumab", "nivolumab"), med_start = structure(c(17569, 
    17590, 17536, 17794, 17807, 17836), class = "Date"), med_end = structure(c(17569, 
    17590, 17536, 17794, 17807, 17836), class = "Date")), row.names = c(NA, 
-6L), groups = structure(list(patient_id = structure(c(1.49283861796358e-314, 
1.6423825257779e-313), class = "integer64"), .rows = structure(list(
    1:2, 3:6), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

我们按 'id' 和 filter 分组,其中 all 感兴趣的药物是 %in% 'drug_name' 列,并提取 unique 'id'

library(dplyr)
data %>%
    group_by(id) %>%
    filter(all(c("pembrolizumab", "nivolumab") %in% drug_name)) %>% 
    ungroup %>%
    pull(id)%>% 
   unique

-输出

[1] 2

这里有一些基本的 R 选项

  1. 第一个问题
> unique(
+   subset(
+     data,
+     ave(match(drug_name, c("pembrolizumab", "nivolumab")), id, FUN = var) > 0,
+     select = id
+   )
+ )
# A tibble: 1 x 1
  id
  <int>
1 2
  1. 第二题
> subset(
+   data,
+   ave(as.integer(med_start), id, FUN = function(x) max(diff(x))) <= 30
+ )
# A tibble: 2 x 4
  id    drug_name     med_start  med_end
  <int> <chr>         <date>     <date>
1 1     pembrolizumab 2018-02-07 2018-02-07
2 1     pembrolizumab 2018-02-28 2018-02-28