使用 R 中多个“键”列中的值将数据框从长格式复杂重塑为宽格式
Complex reshaping of data frame from long-form to wide-form using values in multiple “key” columns in R
我希望能够使用纵向临床试验数据将长格式数据框重塑为宽格式数据框。以下是我希望更改的长格式示例:
structure(list(study = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("Jones,
1996", "Smith, 1999"), class = "factor"), group_allocation =
structure(c(2L, 1L, 2L, 3L, 1L), .Label = c("control", "intervention_1",
"intervention_2"), class = "factor"), outcome = structure(c(2L, 2L, 1L,
1L, 1L), .Label = c("anxiety", "depression"), class = "factor"), bl_mean =
c(6.5, 4.5, 3.7, 4.2, 5.3), fu_timepoint = c(6L, 6L, 12L, 12L, 12L),
fu_mean = c(5.2, 7.5, 2.5, 2.7, 6.3), mean_diff = c(-2.3, NA, -3.8, -3.6,
NA)), class = "data.frame", row.names = c(NA, -5L))
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
2 Smith, 1999 control depression 4.5 6 7.5 NA
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
我的问题是,对于每个研究,group_allocation 列(标记为 "intervention_1" 和 "intervention_2")中的每个干预组我只需要一个 observation/row,并且我需要将对照组数据(在 group_allocation 列中标记为 "control" )移动到与每个干预组相同的行中的单独列中,以便分析比较干预组与对照组的数据(跨数据框)。这是我要找的东西:
structure(list(study = structure(c(2L, 1L, 1L), .Label = c("Jones, 1996",
"Smith, 1999"), class = "factor"), ig_group_allocation = structure(c(1L,
1L, 2L), .Label = c("intervention_1", "intervention_2"), class =
"factor"), outcome = structure(c(2L, 1L, 1L), .Label = c("anxiety",
"depression"), class = "factor"), ig_bl_mean = c(6.5, 3.7, 4.2),
fu_timepoint = c(6L, 12L, 12L), ig_fu_mean = c(5.2, 2.5, 2.7), mean_diff =
c(-2.3, -3.8, -3.6), cg_group_allocation = structure(c(1L, 1L, 1L), .Label
= "control", class = "factor"), cg_bl_mean = c(4.5, 5.3, 5.3), cg_fu_mean
= c(7.5, 6.3, 6.3)), class = "data.frame", row.names = c(NA, -3L))
study ig_group_allocation outcome ig_bl_mean fu_timepoint ig_fu_meanmean_diff cg_group_allocation cg_bl_mean cg_fu_mean
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3 control 4.5 7.5
2 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8 control 5.3 6.3
3 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6 control 5.3 6.3
我已经阅读了许多关于堆栈溢出的其他数据重塑问题,但尚未找到与我的问题类似的解决方案。
谢谢!
将您的数据分成两个数据框,一个用于控制,一个用于干预,然后将它们重新合并在一起。
df
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
2 Smith, 1999 control depression 4.5 6 7.5 NA
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
interventions<-df[grep("intervention", df$group_allocation),]
interventions
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
controls<-df[grep("control", df$group_allocation),]
controls
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
2 Smith, 1999 control depression 4.5 6 7.5 NA
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
names(controls)<-paste0("cg_", names(controls)) #add cg prefix to colnames
new_df<-merge(interventions, controls, by.x="study", by.y="cg_study", all.x=TRUE)
new_df
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff cg_group_allocation cg_outcome cg_bl_mean cg_fu_timepoint cg_fu_mean cg_mean_diff
1 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8 control anxiety 5.3 12 6.3 NA
2 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6 control anxiety 5.3 12 6.3 NA
3 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3 control depression 4.5 6 7.5 NA
我希望能够使用纵向临床试验数据将长格式数据框重塑为宽格式数据框。以下是我希望更改的长格式示例:
structure(list(study = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("Jones,
1996", "Smith, 1999"), class = "factor"), group_allocation =
structure(c(2L, 1L, 2L, 3L, 1L), .Label = c("control", "intervention_1",
"intervention_2"), class = "factor"), outcome = structure(c(2L, 2L, 1L,
1L, 1L), .Label = c("anxiety", "depression"), class = "factor"), bl_mean =
c(6.5, 4.5, 3.7, 4.2, 5.3), fu_timepoint = c(6L, 6L, 12L, 12L, 12L),
fu_mean = c(5.2, 7.5, 2.5, 2.7, 6.3), mean_diff = c(-2.3, NA, -3.8, -3.6,
NA)), class = "data.frame", row.names = c(NA, -5L))
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
2 Smith, 1999 control depression 4.5 6 7.5 NA
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
我的问题是,对于每个研究,group_allocation 列(标记为 "intervention_1" 和 "intervention_2")中的每个干预组我只需要一个 observation/row,并且我需要将对照组数据(在 group_allocation 列中标记为 "control" )移动到与每个干预组相同的行中的单独列中,以便分析比较干预组与对照组的数据(跨数据框)。这是我要找的东西:
structure(list(study = structure(c(2L, 1L, 1L), .Label = c("Jones, 1996",
"Smith, 1999"), class = "factor"), ig_group_allocation = structure(c(1L,
1L, 2L), .Label = c("intervention_1", "intervention_2"), class =
"factor"), outcome = structure(c(2L, 1L, 1L), .Label = c("anxiety",
"depression"), class = "factor"), ig_bl_mean = c(6.5, 3.7, 4.2),
fu_timepoint = c(6L, 12L, 12L), ig_fu_mean = c(5.2, 2.5, 2.7), mean_diff =
c(-2.3, -3.8, -3.6), cg_group_allocation = structure(c(1L, 1L, 1L), .Label
= "control", class = "factor"), cg_bl_mean = c(4.5, 5.3, 5.3), cg_fu_mean
= c(7.5, 6.3, 6.3)), class = "data.frame", row.names = c(NA, -3L))
study ig_group_allocation outcome ig_bl_mean fu_timepoint ig_fu_meanmean_diff cg_group_allocation cg_bl_mean cg_fu_mean
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3 control 4.5 7.5
2 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8 control 5.3 6.3
3 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6 control 5.3 6.3
我已经阅读了许多关于堆栈溢出的其他数据重塑问题,但尚未找到与我的问题类似的解决方案。
谢谢!
将您的数据分成两个数据框,一个用于控制,一个用于干预,然后将它们重新合并在一起。
df
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
2 Smith, 1999 control depression 4.5 6 7.5 NA
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
interventions<-df[grep("intervention", df$group_allocation),]
interventions
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
1 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3
3 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8
4 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6
controls<-df[grep("control", df$group_allocation),]
controls
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff
2 Smith, 1999 control depression 4.5 6 7.5 NA
5 Jones, 1996 control anxiety 5.3 12 6.3 NA
names(controls)<-paste0("cg_", names(controls)) #add cg prefix to colnames
new_df<-merge(interventions, controls, by.x="study", by.y="cg_study", all.x=TRUE)
new_df
study group_allocation outcome bl_mean fu_timepoint fu_mean mean_diff cg_group_allocation cg_outcome cg_bl_mean cg_fu_timepoint cg_fu_mean cg_mean_diff
1 Jones, 1996 intervention_1 anxiety 3.7 12 2.5 -3.8 control anxiety 5.3 12 6.3 NA
2 Jones, 1996 intervention_2 anxiety 4.2 12 2.7 -3.6 control anxiety 5.3 12 6.3 NA
3 Smith, 1999 intervention_1 depression 6.5 6 5.2 -2.3 control depression 4.5 6 7.5 NA