避免在 reshape() 中猜测 'varying'

Avoid guessing of 'varying' in reshape()

我正在尝试 reshape() R 中的一些时变数据。我正在使用以下数据集:

dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")

这些是来自纵向研究的时变数据,以及我从源文件导入的更大数据集的子集。我想为 baselinefinal 研究访问提取值 popcoercb(在我的完整数据集中有几次访问介于两者之间,出于这个问题的目的我已经省略了)。

我可以做到以下几点:

reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')

然而,这最终导致 pop 中的值被标记为 coereshape2 的文档告诉我应该明确引用 varying 值以避免 'guessing'。所以,我试试这个:

reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')

这导致 完全相同的输出 ,尽管明确命名了 varying 参数。我究竟做错了什么?据推测,pop 由于字母顺序而以 coe 的值结束,但我不明白为什么会发生这种情况,因为我现在已经明确声明了 varying 参数...

编辑: 预期输出如下:

participant_id  time    pop coe         rcb
FDVCZX          1       6   11.19       16.74
ADSCXZ          1       6   13.6        25
AESFDZC         1       7   3.96        25
ZXCV            1       6   7.64        18.37
AGS             1       6   6.12        25
AGSFV           1       6   6.92        25
FDVCZX          2       NA  NA          NA
ADSCXZ          2       NA  NA          NA
AESFDZC         2       7.1 5.926362    25
ZXCV            2       8   4.89        NA
AGS             2       6   11.98       NA
AGSFV           2       NA  NA          NA

但是,如您所见,pop 值最终出现在 coe 列中,反之亦然。

我们可以使用 data.table 中的 melt,它可以包含多个 measure 列。

library(data.table)
melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'), 
     value.name = c('pop', 'coe', 'rcb'), variable.name='time')
#    participant_id time pop       coe   rcb
# 1:         FDVCZX    1 6.0 11.190000 16.74
# 2:         ADSCXZ    1 6.0 13.600000 25.00
# 3:        AESFDZC    1 7.0  3.960000 25.00
# 4:           ZXCV    1 6.0  7.640000 18.37
# 5:            AGS    1 6.0  6.120000 25.00
# 6:          AGSFV    1 6.0  6.920000 25.00
# 7:         FDVCZX    2  NA        NA    NA
# 8:         ADSCXZ    2  NA        NA    NA
# 9:        AESFDZC    2 7.1  5.926362 25.00
#10:           ZXCV    2 8.0  4.890000    NA
#11:            AGS    2 6.0 11.980000    NA
#12:          AGSFV    2  NA        NA    NA