如何使用时间序列和频率数据创建宽长数据集?
How to create wide long data set with time series and frequency data?
给定时间序列和频率数据,如 dat1,其中包含 event_id 和每个事件时间的频率。
要将其转换为序列宽长数据,如 dat2,R 最优雅的方法是什么?
dat1
id event_no event_id times
P001 1 A 3
P001 2 B 1
P001 3 C 2
P001 4 D 5
P002 1 A 5
P002 2 B 3
P002 3 C 1
P002 4 D 1
P002 5 E 1
dat2
id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
P001 A A A B C C D D D D D
P002 A A A A A B B B C D E
谢谢
使用 dplyr
和 tidyr
,我们可以先使用 uncount
重复行,然后在按 id
分组后创建唯一行并使用 pivot_wider
将数据转换为宽格式。
library(dplyr)
library(tidyr)
df %>%
uncount(times) %>%
group_by(id) %>%
mutate(event_no = paste0("t", row_number())) %>%
pivot_wider(names_from = event_no, values_from = event_id)
#Use spread in older version of tidyr
#spread(event_no, event_id)
# id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
# <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
#1 P001 A A A B C C D D D D D
#2 P002 A A A A A B B B C D E
数据
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), .Label = c("P001", "P002"), class = "factor"), event_no = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), event_id = structure(c(1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), times = c(3L, 1L, 2L, 5L, 5L, 3L, 1L, 1L,
1L)), class = "data.frame", row.names = c(NA, -9L))
给定时间序列和频率数据,如 dat1,其中包含 event_id 和每个事件时间的频率。
要将其转换为序列宽长数据,如 dat2,R 最优雅的方法是什么?
dat1 id event_no event_id times P001 1 A 3 P001 2 B 1 P001 3 C 2 P001 4 D 5 P002 1 A 5 P002 2 B 3 P002 3 C 1 P002 4 D 1 P002 5 E 1
dat2 id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 P001 A A A B C C D D D D D P002 A A A A A B B B C D E
谢谢
使用 dplyr
和 tidyr
,我们可以先使用 uncount
重复行,然后在按 id
分组后创建唯一行并使用 pivot_wider
将数据转换为宽格式。
library(dplyr)
library(tidyr)
df %>%
uncount(times) %>%
group_by(id) %>%
mutate(event_no = paste0("t", row_number())) %>%
pivot_wider(names_from = event_no, values_from = event_id)
#Use spread in older version of tidyr
#spread(event_no, event_id)
# id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
# <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
#1 P001 A A A B C C D D D D D
#2 P002 A A A A A B B B C D E
数据
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), .Label = c("P001", "P002"), class = "factor"), event_no = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), event_id = structure(c(1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), times = c(3L, 1L, 2L, 5L, 5L, 3L, 1L, 1L,
1L)), class = "data.frame", row.names = c(NA, -9L))