数据转换：我正在 R 中寻找一种有效的方法 recode/expand 多对一进行生存分析

Question

我正在查看手术后移植物通畅性 (CABG)

在 CABG 手术中，一名患者通常会接受多个移植物（搭桥），我们正在研究失败时间。这在原始数据中由一个变量指示，该变量指示失败移植物的数量和诊断时间。

我的原始数据目前是每个患者一行，我相信我需要将其设为每个移植物一行，以便继续进行 KM 和 Cox 分析。我正在考虑各种 if/then 循环，但想知道是否有更有效的方法在这里重新编码。

示例数据：

Patient VeinGrafts   VeinsOccluded   Months
   1        2               0           36
   2        4               1           34
   3        3               2           38
   4        4               0           33

为了查看这个“每条静脉”，我需要重新编码，以便每个 #VeinGraft 都有自己的行，并且 VeinsOccluded 变为 1/0

我需要每行复制 (VeinGrafts) 次，这样患者 2 将有 4 行，但其中一排有 VeinsOccluded 指示器，另外 3 排没有

这就是我在下一个分析步骤中需要的上述数据。

Patient VeinGrafts   VeinsOccluded   Months
   1        2               0           36
   1        2               0           36
   2        4               1           34
   2        4               0           34
   2        4               0           34
   2        4               0           34
   3        3               1           38
   3        3               1           38
   3        3               0           38
   4        4               0           33
   4        4               0           33
   4        4               0           33
   4        4               0           33

这个社区在这一点上提供了极大的帮助，但我找不到类似问题的答案 - 如果我忽略了，我深表歉意，但非常感谢您的任何想法！

Answer 1

我们可以uncount扩展数据，然后通过'Patient'、mutate分组，'VeinsOccluded'通过创建一个逻辑表达式row_number() first 'VeinsOccluded' 的值，用 +

强制转换为二进制

library(dplyr)
library(tidyr)
df1 %>%
    uncount(VeinGrafts, .remove = FALSE) %>%
    group_by(Patient) %>% 
    mutate(VeinsOccluded = +(row_number() <= first(VeinsOccluded))) %>%
    ungroup %>%
    select(names(df1))

-输出

# A tibble: 13 x 4
#   Patient VeinGrafts VeinsOccluded Months
#     <int>      <int>         <int>  <int>
# 1       1          2             0     36
# 2       1          2             0     36
# 3       2          4             1     34
# 4       2          4             0     34
# 5       2          4             0     34
# 6       2          4             0     34
# 7       3          3             1     38
# 8       3          3             1     38
# 9       3          3             0     38
#10       4          4             0     33
#11       4          4             0     33
#12       4          4             0     33
#13       4          4             0     33

或者这可以用 data.table 完成（可能以更有效的方式）

library(data.table)
setDT(df1)[rep(seq_len(.N), VeinGrafts)][,
   VeinsOccluded := +(seq_len(.N) <= first(VeinsOccluded)), Patient][]

-输出

#      Patient VeinGrafts VeinsOccluded Months
# 1:       1          2             0     36
# 2:       1          2             0     36
# 3:       2          4             1     34
# 4:       2          4             0     34
# 5:       2          4             0     34
# 6:       2          4             0     34
# 7:       3          3             1     38
# 8:       3          3             1     38
# 9:       3          3             0     38
#10:       4          4             0     33
#11:       4          4             0     33
#12:       4          4             0     33
#13:       4          4             0     33

数据

df1 <- structure(list(Patient = 1:4, VeinGrafts = c(2L, 4L, 3L, 4L), 
    VeinsOccluded = c(0L, 1L, 2L, 0L), Months = c(36L, 34L, 38L, 
    33L)), class = "data.frame", row.names = c(NA, -4L))

数据转换：我正在 R 中寻找一种有效的方法 recode/expand 多对一进行生存分析

Data transformation: I am looking for an efficient way in R to recode/expand many-to-one for survival analysis

r

survival-analysis

recode

数据