根据条件集向数据集添加行

Add rows to dataset depending on set of conditions

我有以下数据集:

individual number treatment
1          1       AAAA
1          2       BBBB
1          3       CCCC
1          4       EEEE
1          5       XXXX
1          7       WWWW
2          2       EEEE
2          3       AAAA
2          5       RRRR

个人最多可以接受 7 次治疗,但有些人最多只能接受 5 次治疗(如下例 individual_id=2)。我需要为每个人添加新行,直到他们进行的最大治疗次数(例如 individual_id=1 最多 7 次,individual_id=2 最多 5 次),治疗 = NA。我想要这样的东西:

   individual_id number treatment
    1              1       AAAA
    1              2       BBBB
    1              3       CCCC
    1              4       EEEE
    1              5       XXXX
    1              6       NA
    1              7       WWWW
    2              1       NA
    2              2       EEEE
    2              3       AAAA
    2              4       NA
    2              5       RRRR

这是我的实际数据集的可重现示例:

structure(list(individual_id = c(21L, 21L, 21L, 21L, 21L, 21L, 
22L, 22L, 22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 
24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 24L, 25L, 25L, 
25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
26L, 26L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 
27L), number = c(2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5, 7, 7, 7, 
7, 7, 7, 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 4), treatment = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 
4L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 
4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Adalimumab", "Etanercept", 
"Infliximab", "Rituximab"), class = "factor")), row.names = c(NA, 
-72L), class = "data.frame")

为此我们可以使用 :

library(tidyverse)

df %>% 
  group_by(individual) %>% 
  complete(nesting(individual), number = seq(min(number), max(number), 1))


# # A tibble: 12 x 3
# # Groups:   individual [2]
#   individual number treatment
#        <int>  <dbl>     <fct>    
# 1          1      1      AAAA     
# 2          1      2      BBBB     
# 3          1      3      CCCC     
# 4          1      4      EEEE     
# 5          1      5      XXXX     
# 6          1      6        NA       
# 7          1      7      WWWW     
# 8          2      1      EEEE     
# 9          2      2        NA       
# 10         2      3      AAAA     
# 11         2      4        NA       
# 12         2      5      RRRR   

注:对于这个具体问题,根据下面的评论,number = seq(min(number), max(number), 1)...应该是number = seq(1, max(number), 1),因为1是总是第一个 number 无论它是否存在于记录中。但我将其保留在上面的行中,因为这似乎是一个更通用的解决方案。

考虑为所有可能的个体和治疗编号配对构建一个辅助数据框,然后 运行 与原始数据集进行左连接合并。

下方by个人拆分,使用expand.grid迭代构建数据框,用于个人[=22]的所有成对组合=] 和 数字 。最后,do.call 将组子集数据帧列表绑定到一个最终数据帧中:fill_df.

fill_df <- do.call(rbind, by(df, df$individual, function(sub) 
                                expand.grid(individual = unique(sub$individual),
                                            number = 1:max(sub$number))
                          )
                  )

final_df <- merge(fill_df, df, all.x=TRUE)
final_df

#    individual number treatment
# 1           1      1      AAAA
# 2           1      2      BBBB
# 3           1      3      CCCC
# 4           1      4      EEEE
# 5           1      5      XXXX
# 6           1      6      <NA>
# 7           1      7      WWWW
# 8           2      1      EEEE
# 9           2      2      <NA>
# 10          2      3      AAAA
# 11          2      4      <NA>
# 12          2      5      RRRR