如何使用时间序列和频率数据创建宽长数据集?

How to create wide long data set with time series and frequency data?

给定时间序列和频率数据,如 dat1,其中包含 event_id 和每个事件时间的频率。

要将其转换为序列宽长数据,如 dat2,R 最优雅的方法是什么?

dat1
id  event_no    event_id    times
P001    1   A   3
P001    2   B   1
P001    3   C   2
P001    4   D   5
P002    1   A   5
P002    2   B   3
P002    3   C   1
P002    4   D   1
P002    5   E   1
dat2
id  t1  t2  t3  t4  t5  t6  t7  t8  t9  t10 t11
P001    A   A   A   B   C   C   D   D   D   D   D
P002    A   A   A   A   A   B   B   B   C   D   E

谢谢

使用 dplyrtidyr,我们可以先使用 uncount 重复行,然后在按 id 分组后创建唯一行并使用 pivot_wider将数据转换为宽格式。

library(dplyr)
library(tidyr)

df %>%
  uncount(times) %>%
  group_by(id) %>%
  mutate(event_no = paste0("t", row_number())) %>%
  pivot_wider(names_from = event_no, values_from = event_id)
  #Use spread in older version of tidyr
  #spread(event_no, event_id) 

#  id    t1    t2    t3    t4    t5    t6    t7    t8    t9    t10   t11  
#  <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
#1 P001    A     A     A     B     C     C     D     D     D     D     D    
#2 P002    A     A     A     A     A     B     B     B     C     D     E    

数据

df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), .Label = c("P001", "P002"), class = "factor"), event_no = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), event_id = structure(c(1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), times = c(3L, 1L, 2L, 5L, 5L, 3L, 1L, 1L, 
1L)), class = "data.frame", row.names = c(NA, -9L))