使用两个 ID 变量将大型数据集从宽改造成长
Reshape large dataset from wide to long with two ID variables
我想使用两个 ID 变量将我的数据从长格式更改为宽格式。
我有以下代码可用于以下示例数据集。但是,当我 运行 这段代码与我正在使用的更大的数据集一起使用时,代码 运行s 持续了很长时间并且似乎没有完成 运行ning。当我使用一个 ID 变量时,代码 运行 没问题,但我需要包括两个。
是否有更有效的从长格式转换为宽格式的方法?
(我也想过根据ID1和ID2创建一个ID变量,用于从长到宽的转换。也许这是最好的解决方案?)
Wide.vars <- names(df[,c("Date","V1")])
### 1. Reshape from wide to long format with two ID variables
df_wide <- reshape(as.data.frame(df),
idvar = c("ID1","ID2"),
direction = "wide",
v.names = Wide.vars,
timevar = "Timepoint")
下面的示例数据(请注意示例数据集的维度是 15 行 5 列,而我正在使用的数据集是 15658 行乘 99 列)。
df <- structure(list(ID1 = c(5643923L, 5643923L, 5643923L, 3914822L,
3914822L, 3914822L, 3914822L, 1156115L, 1506426L, 7183921L, 4753447L,
4606792L, 8492773L, 8492773L, 8492773L), ID2 = c("02179",
"02179", "04101", "00819", "00819", "00819", "00819",
"01904", "01127", "00475", "02084", "04118", "15553",
"15553", "15553"), Date = structure(c(16731, 16731,
16731, 16732, 16733, 16733, 16733, 16733, 16733, 16733, 16733,
16733, 16734, 16734, 16734), class = "Date"), Timepoint = structure(c(1L,
3L, 1L, 1L, 3L, 4L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L), .Label = c("baseline",
"wave0.5", "wave1", "wave2", "wave3", "wave4"), class = "factor"), V1 = c(0, 8, 4, 9.5, 7, 7, 12, 9, 11, 8.4,
7.8, 6.6, 5, 5.5, 8.9)), row.names = c(NA,
-15L), groups = structure(list(CP1_t_210 = structure(1L, .Label = c("baseline",
"wave0.5", "wave1", "wave2", "wave3", "wave4"), class = "factor"),
.rows = structure(list(1:15), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
data.table
通常更快,你可以尝试使用dcast
。
library(data.table)
dcast(setDT(df), ID1+ID2~Timepoint, value.var = c('Date', 'V1'))
@Mark Davies 的建议pivot_wider
也有帮助。
tidyr::pivot_wider(df, names_from = Timepoint, values_from = c(Date, V1))
我想使用两个 ID 变量将我的数据从长格式更改为宽格式。
我有以下代码可用于以下示例数据集。但是,当我 运行 这段代码与我正在使用的更大的数据集一起使用时,代码 运行s 持续了很长时间并且似乎没有完成 运行ning。当我使用一个 ID 变量时,代码 运行 没问题,但我需要包括两个。
是否有更有效的从长格式转换为宽格式的方法?
(我也想过根据ID1和ID2创建一个ID变量,用于从长到宽的转换。也许这是最好的解决方案?)
Wide.vars <- names(df[,c("Date","V1")])
### 1. Reshape from wide to long format with two ID variables
df_wide <- reshape(as.data.frame(df),
idvar = c("ID1","ID2"),
direction = "wide",
v.names = Wide.vars,
timevar = "Timepoint")
下面的示例数据(请注意示例数据集的维度是 15 行 5 列,而我正在使用的数据集是 15658 行乘 99 列)。
df <- structure(list(ID1 = c(5643923L, 5643923L, 5643923L, 3914822L,
3914822L, 3914822L, 3914822L, 1156115L, 1506426L, 7183921L, 4753447L,
4606792L, 8492773L, 8492773L, 8492773L), ID2 = c("02179",
"02179", "04101", "00819", "00819", "00819", "00819",
"01904", "01127", "00475", "02084", "04118", "15553",
"15553", "15553"), Date = structure(c(16731, 16731,
16731, 16732, 16733, 16733, 16733, 16733, 16733, 16733, 16733,
16733, 16734, 16734, 16734), class = "Date"), Timepoint = structure(c(1L,
3L, 1L, 1L, 3L, 4L, 5L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L), .Label = c("baseline",
"wave0.5", "wave1", "wave2", "wave3", "wave4"), class = "factor"), V1 = c(0, 8, 4, 9.5, 7, 7, 12, 9, 11, 8.4,
7.8, 6.6, 5, 5.5, 8.9)), row.names = c(NA,
-15L), groups = structure(list(CP1_t_210 = structure(1L, .Label = c("baseline",
"wave0.5", "wave1", "wave2", "wave3", "wave4"), class = "factor"),
.rows = structure(list(1:15), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
data.table
通常更快,你可以尝试使用dcast
。
library(data.table)
dcast(setDT(df), ID1+ID2~Timepoint, value.var = c('Date', 'V1'))
@Mark Davies 的建议pivot_wider
也有帮助。
tidyr::pivot_wider(df, names_from = Timepoint, values_from = c(Date, V1))