将基于序列和值的数值变量扩展到多列
Expand a numerical variable based on sequence and value to multi columns
我有这样的数据:
structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L,
1L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-9L))
这些数字是参与者参与研究的时间。我想形成列,每次都有一个二项式列。
structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L, 1L, 4L, 2L),
t1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), t2 = c(1L, 1L,
1L, NA, 1L, 1L, NA, 1L, 1L), t3 = c(1L, 1L, NA, NA, NA, 1L,
NA, 1L, NA), t4 = c(NA, 1L, NA, NA, NA, NA, NA, 1L, NA)), class = "data.frame", row.names = c(NA,
-9L))
这是一个非常直接的方法,使用 base R(调用你的输入 df
):
max_length = max(df$time)
rows = lapply(df$time, function(t) c(rep(1, t), rep(NA, max_length - t)))
result = cbind(df, do.call(rbind, rows))
names(result)[-1] = paste0("t", names(result)[-1])
result
# time t1 t2 t3 t4
# 1 3 1 1 1 NA
# 2 4 1 1 1 1
# 3 2 1 1 NA NA
# 4 1 1 NA NA NA
# 5 2 1 1 NA NA
# 6 3 1 1 1 NA
# 7 1 1 NA NA NA
# 8 4 1 1 1 1
# 9 2 1 1 NA NA
另一个基数为 R 的选项:
m <- matrix(nrow = nrow(dtt), ncol = max(dtt$time))
m[col(m) <= dtt$time] <- 1L
cbind(dtt, m)
# time 1 2 3 4
# 1 3 1 1 1 NA
# 2 4 1 1 1 1
# 3 2 1 1 NA NA
# 4 1 1 NA NA NA
# 5 2 1 1 NA NA
# 6 3 1 1 1 NA
# 7 1 1 NA NA NA
# 8 4 1 1 1 1
# 9 2 1 1 NA NA
这是一个 tidyverse
方法:
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
uncount(time, .remove = FALSE) %>%
group_by(row) %>%
mutate(col = row_number()) %>%
pivot_wider(names_from = col, values_from = col,
values_fn = length, names_prefix = 't') %>%
ungroup %>%
select(-row)
# time t1 t2 t3 t4
# <int> <int> <int> <int> <int>
#1 3 1 1 1 NA
#2 4 1 1 1 1
#3 2 1 1 NA NA
#4 1 1 NA NA NA
#5 2 1 1 NA NA
#6 3 1 1 1 NA
#7 1 1 NA NA NA
#8 4 1 1 1 1
#9 2 1 1 NA NA
我有这样的数据:
structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L,
1L, 4L, 2L)), class = "data.frame", row.names = c(NA,
-9L))
这些数字是参与者参与研究的时间。我想形成列,每次都有一个二项式列。
structure(list(time = c(3L, 4L, 2L, 1L, 2L, 3L, 1L, 4L, 2L),
t1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), t2 = c(1L, 1L,
1L, NA, 1L, 1L, NA, 1L, 1L), t3 = c(1L, 1L, NA, NA, NA, 1L,
NA, 1L, NA), t4 = c(NA, 1L, NA, NA, NA, NA, NA, 1L, NA)), class = "data.frame", row.names = c(NA,
-9L))
这是一个非常直接的方法,使用 base R(调用你的输入 df
):
max_length = max(df$time)
rows = lapply(df$time, function(t) c(rep(1, t), rep(NA, max_length - t)))
result = cbind(df, do.call(rbind, rows))
names(result)[-1] = paste0("t", names(result)[-1])
result
# time t1 t2 t3 t4
# 1 3 1 1 1 NA
# 2 4 1 1 1 1
# 3 2 1 1 NA NA
# 4 1 1 NA NA NA
# 5 2 1 1 NA NA
# 6 3 1 1 1 NA
# 7 1 1 NA NA NA
# 8 4 1 1 1 1
# 9 2 1 1 NA NA
另一个基数为 R 的选项:
m <- matrix(nrow = nrow(dtt), ncol = max(dtt$time))
m[col(m) <= dtt$time] <- 1L
cbind(dtt, m)
# time 1 2 3 4
# 1 3 1 1 1 NA
# 2 4 1 1 1 1
# 3 2 1 1 NA NA
# 4 1 1 NA NA NA
# 5 2 1 1 NA NA
# 6 3 1 1 1 NA
# 7 1 1 NA NA NA
# 8 4 1 1 1 1
# 9 2 1 1 NA NA
这是一个 tidyverse
方法:
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
uncount(time, .remove = FALSE) %>%
group_by(row) %>%
mutate(col = row_number()) %>%
pivot_wider(names_from = col, values_from = col,
values_fn = length, names_prefix = 't') %>%
ungroup %>%
select(-row)
# time t1 t2 t3 t4
# <int> <int> <int> <int> <int>
#1 3 1 1 1 NA
#2 4 1 1 1 1
#3 2 1 1 NA NA
#4 1 1 NA NA NA
#5 2 1 1 NA NA
#6 3 1 1 1 NA
#7 1 1 NA NA NA
#8 4 1 1 1 1
#9 2 1 1 NA NA