避免在 reshape() 中猜测 'varying'
Avoid guessing of 'varying' in reshape()
我正在尝试 reshape()
R 中的一些时变数据。我正在使用以下数据集:
dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")
这些是来自纵向研究的时变数据,以及我从源文件导入的更大数据集的子集。我想为 baseline
和 final
研究访问提取值 pop
、coe
和 rcb
(在我的完整数据集中有几次访问介于两者之间,出于这个问题的目的我已经省略了)。
我可以做到以下几点:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')
然而,这最终导致 pop
中的值被标记为 coe
。 reshape2
的文档告诉我应该明确引用 varying
值以避免 'guessing'。所以,我试试这个:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')
这导致 完全相同的输出 ,尽管明确命名了 varying
参数。我究竟做错了什么?据推测,pop
由于字母顺序而以 coe
的值结束,但我不明白为什么会发生这种情况,因为我现在已经明确声明了 varying
参数...
编辑: 预期输出如下:
participant_id time pop coe rcb
FDVCZX 1 6 11.19 16.74
ADSCXZ 1 6 13.6 25
AESFDZC 1 7 3.96 25
ZXCV 1 6 7.64 18.37
AGS 1 6 6.12 25
AGSFV 1 6 6.92 25
FDVCZX 2 NA NA NA
ADSCXZ 2 NA NA NA
AESFDZC 2 7.1 5.926362 25
ZXCV 2 8 4.89 NA
AGS 2 6 11.98 NA
AGSFV 2 NA NA NA
但是,如您所见,pop
值最终出现在 coe
列中,反之亦然。
我们可以使用 data.table
中的 melt
,它可以包含多个 measure
列。
library(data.table)
melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'),
value.name = c('pop', 'coe', 'rcb'), variable.name='time')
# participant_id time pop coe rcb
# 1: FDVCZX 1 6.0 11.190000 16.74
# 2: ADSCXZ 1 6.0 13.600000 25.00
# 3: AESFDZC 1 7.0 3.960000 25.00
# 4: ZXCV 1 6.0 7.640000 18.37
# 5: AGS 1 6.0 6.120000 25.00
# 6: AGSFV 1 6.0 6.920000 25.00
# 7: FDVCZX 2 NA NA NA
# 8: ADSCXZ 2 NA NA NA
# 9: AESFDZC 2 7.1 5.926362 25.00
#10: ZXCV 2 8.0 4.890000 NA
#11: AGS 2 6.0 11.980000 NA
#12: AGSFV 2 NA NA NA
我正在尝试 reshape()
R 中的一些时变数据。我正在使用以下数据集:
dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")
这些是来自纵向研究的时变数据,以及我从源文件导入的更大数据集的子集。我想为 baseline
和 final
研究访问提取值 pop
、coe
和 rcb
(在我的完整数据集中有几次访问介于两者之间,出于这个问题的目的我已经省略了)。
我可以做到以下几点:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')
然而,这最终导致 pop
中的值被标记为 coe
。 reshape2
的文档告诉我应该明确引用 varying
值以避免 'guessing'。所以,我试试这个:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')
这导致 完全相同的输出 ,尽管明确命名了 varying
参数。我究竟做错了什么?据推测,pop
由于字母顺序而以 coe
的值结束,但我不明白为什么会发生这种情况,因为我现在已经明确声明了 varying
参数...
编辑: 预期输出如下:
participant_id time pop coe rcb
FDVCZX 1 6 11.19 16.74
ADSCXZ 1 6 13.6 25
AESFDZC 1 7 3.96 25
ZXCV 1 6 7.64 18.37
AGS 1 6 6.12 25
AGSFV 1 6 6.92 25
FDVCZX 2 NA NA NA
ADSCXZ 2 NA NA NA
AESFDZC 2 7.1 5.926362 25
ZXCV 2 8 4.89 NA
AGS 2 6 11.98 NA
AGSFV 2 NA NA NA
但是,如您所见,pop
值最终出现在 coe
列中,反之亦然。
我们可以使用 data.table
中的 melt
,它可以包含多个 measure
列。
library(data.table)
melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'),
value.name = c('pop', 'coe', 'rcb'), variable.name='time')
# participant_id time pop coe rcb
# 1: FDVCZX 1 6.0 11.190000 16.74
# 2: ADSCXZ 1 6.0 13.600000 25.00
# 3: AESFDZC 1 7.0 3.960000 25.00
# 4: ZXCV 1 6.0 7.640000 18.37
# 5: AGS 1 6.0 6.120000 25.00
# 6: AGSFV 1 6.0 6.920000 25.00
# 7: FDVCZX 2 NA NA NA
# 8: ADSCXZ 2 NA NA NA
# 9: AESFDZC 2 7.1 5.926362 25.00
#10: ZXCV 2 8.0 4.890000 NA
#11: AGS 2 6.0 11.980000 NA
#12: AGSFV 2 NA NA NA