如何转换多行列并融化为长格式 R data.table
How to transform multi rows columns and melt to long form R data.table
有table从宽格式转换为长格式。它包含 +200 列,由多列组成,如下所示:
原始数据:
# dt
dt <- data.table("1" = c(NA,"Place","dan","uan","yan"),
"2" = c(NA,"Place_2","adan","duan","eyan"),
"3" = c("something","Male",1253,6643,4325),
"4" = c(1998,"Female",624,623,55),
"5" = c(NA,"Trans",13,51,51),
"6" = c("something2","Male",126,63643,725),
"7" = c(1999,"Female",284,243,557),
"8" = c(NA,"Trans",138,541,11))
从第3列开始,每+3列为年份值
dt[1,c(3:ncol(dt) %% 3 == 1),with = FALSE]
如何有效地将多柱转化为单柱进行熔解?
目标:
Place Place_2 Sex Year num
dan adan Male 1998 1253
dan adan Female 1998 624
dan adan Trans 1998 13
dan adan Male 1999 126
dan adan Female 1999 63643
dan adan Trans 1999 725
uan duan Female 1998 6643
....
您 data.table
的结构不常见。这是一种方法。第 1-3 步是为 melt
.
准备 data.table
如果您需要最终输出看起来与您的预期输出完全一样,您可能需要在 melt
之前创建一个 ID 列并根据多个列排序 dt5
。如果您需要这方面的帮助,请告诉我。
library(data.table)
# Step 1: Get the year value
col_num <- which(c(3:ncol(dt) %% 3 == 2)) + 1
year_vec <- as.numeric(as.vector(t(dt[1, ..col_num])))
# Step 2: Create all combinations of year and Male, Female, and Trans
year_sex_dt <- CJ(as.vector(t(dt[2, 3:5])), year_vec)
year_sex_dt[, V1 := factor(V1, levels = c("Male", "Female", "Trans"))]
keycol <- c("year_vec", "V1")
setorderv(year_sex_dt, keycol)
new_name <- paste(year_sex_dt[, V1], year_sex_dt[, year_vec], sep = "_")
# Step 3: Assign column names
dt2 <- setnames(dt[c(-1, -2)], c(as.vector(t(dt[2, 1:2])), new_name))
# Step 4: melt the data.table
dt3 <- melt(dt2, id.vars = 1:2, variable.name = "Sex_Year", value.name = "num")
dt4 <- dt3[, c("Sex", "Year") := tstrsplit(Sex_Year, "_", fixed = TRUE)]
dt4[, Sex_Year := NULL]
dt5 <- dt4[, c("Place", "Place_2", "Sex", "Year", "num")]
head(dt5)
# Place Place_2 Sex Year num
# 1: dan adan Male 1998 1253
# 2: uan duan Male 1998 6643
# 3: yan eyan Male 1998 4325
# 4: dan adan Female 1998 624
# 5: uan duan Female 1998 623
# 6: yan eyan Female 1998 55
这是我试过的。我认为排列列名是这里的关键。我在下面的代码中提供了解释。
library(data.table)
# Creat new column names. Get the 1st row, search for years, repeat each year
# three times, and paste them with three levels of sex.
unlist(dt[1,]) %>%
grep(pattern = "\d{4}", value = TRUE) %>%
rep(each = 3) %>%
paste(., c("Male", "Female", "Trans"), sep = "_") -> foo
# Set new column names.
setnames(dt, c("Place_1", "Place_2", foo))
# Then, transform the data into a long-format data. Create two new columns
# (i.e., year and sex), and remove the column, variable.
melt(dt[-(1:2)], id.vars = 1:2, measure = patterns("^\d{4}"))[,
c("year", "sex") := tstrsplit(variable, "_", fixed = TRUE)][, -"variable"] -> out
# Sort the result with Place_1 and Place_2. (This is for showing the result).
out[order(Place_1, Place_2)][]
# Place_1 Place_2 value year sex
# 1: dan adan 1253 1998 Male
# 2: dan adan 624 1998 Female
# 3: dan adan 13 1998 Trans
# 4: dan adan 126 1999 Male
# 5: dan adan 284 1999 Female
# 6: dan adan 138 1999 Trans
# 7: uan duan 6643 1998 Male
# 8: uan duan 623 1998 Female
# 9: uan duan 51 1998 Trans
#10: uan duan 63643 1999 Male
#11: uan duan 243 1999 Female
#12: uan duan 541 1999 Trans
#13: yan eyan 4325 1998 Male
#14: yan eyan 55 1998 Female
#15: yan eyan 51 1998 Trans
#16: yan eyan 725 1999 Male
#17: yan eyan 557 1999 Female
#18: yan eyan 11 1999 Trans
有table从宽格式转换为长格式。它包含 +200 列,由多列组成,如下所示:
原始数据:
# dt
dt <- data.table("1" = c(NA,"Place","dan","uan","yan"),
"2" = c(NA,"Place_2","adan","duan","eyan"),
"3" = c("something","Male",1253,6643,4325),
"4" = c(1998,"Female",624,623,55),
"5" = c(NA,"Trans",13,51,51),
"6" = c("something2","Male",126,63643,725),
"7" = c(1999,"Female",284,243,557),
"8" = c(NA,"Trans",138,541,11))
从第3列开始,每+3列为年份值
dt[1,c(3:ncol(dt) %% 3 == 1),with = FALSE]
如何有效地将多柱转化为单柱进行熔解?
目标:
Place Place_2 Sex Year num
dan adan Male 1998 1253
dan adan Female 1998 624
dan adan Trans 1998 13
dan adan Male 1999 126
dan adan Female 1999 63643
dan adan Trans 1999 725
uan duan Female 1998 6643
....
您 data.table
的结构不常见。这是一种方法。第 1-3 步是为 melt
.
data.table
如果您需要最终输出看起来与您的预期输出完全一样,您可能需要在 melt
之前创建一个 ID 列并根据多个列排序 dt5
。如果您需要这方面的帮助,请告诉我。
library(data.table)
# Step 1: Get the year value
col_num <- which(c(3:ncol(dt) %% 3 == 2)) + 1
year_vec <- as.numeric(as.vector(t(dt[1, ..col_num])))
# Step 2: Create all combinations of year and Male, Female, and Trans
year_sex_dt <- CJ(as.vector(t(dt[2, 3:5])), year_vec)
year_sex_dt[, V1 := factor(V1, levels = c("Male", "Female", "Trans"))]
keycol <- c("year_vec", "V1")
setorderv(year_sex_dt, keycol)
new_name <- paste(year_sex_dt[, V1], year_sex_dt[, year_vec], sep = "_")
# Step 3: Assign column names
dt2 <- setnames(dt[c(-1, -2)], c(as.vector(t(dt[2, 1:2])), new_name))
# Step 4: melt the data.table
dt3 <- melt(dt2, id.vars = 1:2, variable.name = "Sex_Year", value.name = "num")
dt4 <- dt3[, c("Sex", "Year") := tstrsplit(Sex_Year, "_", fixed = TRUE)]
dt4[, Sex_Year := NULL]
dt5 <- dt4[, c("Place", "Place_2", "Sex", "Year", "num")]
head(dt5)
# Place Place_2 Sex Year num
# 1: dan adan Male 1998 1253
# 2: uan duan Male 1998 6643
# 3: yan eyan Male 1998 4325
# 4: dan adan Female 1998 624
# 5: uan duan Female 1998 623
# 6: yan eyan Female 1998 55
这是我试过的。我认为排列列名是这里的关键。我在下面的代码中提供了解释。
library(data.table)
# Creat new column names. Get the 1st row, search for years, repeat each year
# three times, and paste them with three levels of sex.
unlist(dt[1,]) %>%
grep(pattern = "\d{4}", value = TRUE) %>%
rep(each = 3) %>%
paste(., c("Male", "Female", "Trans"), sep = "_") -> foo
# Set new column names.
setnames(dt, c("Place_1", "Place_2", foo))
# Then, transform the data into a long-format data. Create two new columns
# (i.e., year and sex), and remove the column, variable.
melt(dt[-(1:2)], id.vars = 1:2, measure = patterns("^\d{4}"))[,
c("year", "sex") := tstrsplit(variable, "_", fixed = TRUE)][, -"variable"] -> out
# Sort the result with Place_1 and Place_2. (This is for showing the result).
out[order(Place_1, Place_2)][]
# Place_1 Place_2 value year sex
# 1: dan adan 1253 1998 Male
# 2: dan adan 624 1998 Female
# 3: dan adan 13 1998 Trans
# 4: dan adan 126 1999 Male
# 5: dan adan 284 1999 Female
# 6: dan adan 138 1999 Trans
# 7: uan duan 6643 1998 Male
# 8: uan duan 623 1998 Female
# 9: uan duan 51 1998 Trans
#10: uan duan 63643 1999 Male
#11: uan duan 243 1999 Female
#12: uan duan 541 1999 Trans
#13: yan eyan 4325 1998 Male
#14: yan eyan 55 1998 Female
#15: yan eyan 51 1998 Trans
#16: yan eyan 725 1999 Male
#17: yan eyan 557 1999 Female
#18: yan eyan 11 1999 Trans