将基于序列和值的数值变量扩展到多列

Expand a numerical variable based on sequence and value to multi columns

我有这样的数据:

structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L, 
1L, 4L, 2L)), class = "data.frame", row.names = c(NA, 
-9L))

这些数字是参与者参与研究的时间。我想形成列,每次都有一个二项式列。

structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L, 1L, 4L, 2L), 
    t1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), t2 = c(1L, 1L, 
    1L, NA, 1L, 1L, NA, 1L, 1L), t3 = c(1L, 1L, NA, NA, NA, 1L, 
    NA, 1L, NA), t4 = c(NA, 1L, NA, NA, NA, NA, NA, 1L, NA)), class = "data.frame", row.names = c(NA, 
-9L))

这是一个非常直接的方法,使用 base R(调用你的输入 df):

max_length = max(df$time)
rows = lapply(df$time, function(t) c(rep(1, t), rep(NA, max_length - t)))
result = cbind(df, do.call(rbind, rows))
names(result)[-1] = paste0("t", names(result)[-1])
result
#   time t1 t2 t3 t4
# 1    3  1  1  1 NA
# 2    4  1  1  1  1
# 3    2  1  1 NA NA
# 4    1  1 NA NA NA
# 5    2  1  1 NA NA
# 6    3  1  1  1 NA
# 7    1  1 NA NA NA
# 8    4  1  1  1  1
# 9    2  1  1 NA NA

另一个基数为 R 的选项:

m <- matrix(nrow = nrow(dtt), ncol = max(dtt$time))
m[col(m) <= dtt$time] <- 1L
cbind(dtt, m)
#   time 1  2  3  4
# 1    3 1  1  1 NA
# 2    4 1  1  1  1
# 3    2 1  1 NA NA
# 4    1 1 NA NA NA
# 5    2 1  1 NA NA
# 6    3 1  1  1 NA
# 7    1 1 NA NA NA
# 8    4 1  1  1  1
# 9    2 1  1 NA NA

这是一个 tidyverse 方法:

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  uncount(time, .remove = FALSE) %>%
  group_by(row) %>%
  mutate(col = row_number()) %>%
  pivot_wider(names_from = col, values_from = col, 
              values_fn = length, names_prefix = 't') %>%
  ungroup %>%
  select(-row)

#   time    t1    t2    t3    t4
#  <int> <int> <int> <int> <int>
#1     3     1     1     1    NA
#2     4     1     1     1     1
#3     2     1     1    NA    NA
#4     1     1    NA    NA    NA
#5     2     1     1    NA    NA
#6     3     1     1     1    NA
#7     1     1    NA    NA    NA
#8     4     1     1     1     1
#9     2     1     1    NA    NA