使用两个 ID 变量将大型数据集从宽改造成长

Reshape large dataset from wide to long with two ID variables

我想使用两个 ID 变量将我的数据从长格式更改为宽格式。

我有以下代码可用于以下示例数据集。但是,当我 运行 这段代码与我正在使用的更大的数据集一起使用时,代码 运行s 持续了很长时间并且似乎没有完成 运行ning。当我使用一个 ID 变量时,代码 运行 没问题,但我需要包括两个。

是否有更有效的从长格式转换为宽格式的方法?

(我也想过根据ID1和ID2创建一个ID变量,用于从长到宽的转换。也许这是最好的解决方案?)

Wide.vars <- names(df[,c("Date","V1")])


### 1. Reshape from wide to long format with two ID variables
df_wide <- reshape(as.data.frame(df),                                  
                     idvar = c("ID1","ID2"), 
                     direction = "wide",
                     v.names = Wide.vars,
                     timevar = "Timepoint")

下面的示例数据(请注意示例数据集的维度是 15 行 5 列,而我正在使用的数据集是 15658 行乘 99 列)。

df <- structure(list(ID1 = c(5643923L, 5643923L, 5643923L, 3914822L, 
3914822L, 3914822L, 3914822L, 1156115L, 1506426L, 7183921L, 4753447L, 
4606792L, 8492773L, 8492773L, 8492773L), ID2 = c("02179", 
"02179", "04101", "00819", "00819", "00819", "00819", 
"01904", "01127", "00475", "02084", "04118", "15553", 
"15553", "15553"), Date = structure(c(16731, 16731, 
16731, 16732, 16733, 16733, 16733, 16733, 16733, 16733, 16733, 
16733, 16734, 16734, 16734), class = "Date"), Timepoint = structure(c(1L, 
3L, 1L, 1L, 3L, 4L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L), .Label = c("baseline", 
"wave0.5", "wave1", "wave2", "wave3", "wave4"), class = "factor"), V1 = c(0, 8, 4, 9.5, 7, 7, 12, 9, 11, 8.4, 
    7.8, 6.6, 5, 5.5, 8.9)), row.names = c(NA, 
-15L), groups = structure(list(CP1_t_210 = structure(1L, .Label = c("baseline", 
"wave0.5", "wave1", "wave2", "wave3", "wave4"), class = "factor"), 
    .rows = structure(list(1:15), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))


data.table通常更快,你可以尝试使用dcast

library(data.table)
dcast(setDT(df), ID1+ID2~Timepoint, value.var = c('Date', 'V1'))

@Mark Davies 的建议pivot_wider 也有帮助。

tidyr::pivot_wider(df, names_from = Timepoint, values_from = c(Date, V1))