用两列重塑长到宽以在 R data.table [R] 中展开
Reshape long to wide with two columns to expand in R data.table [R]
我正在尝试将以下数据转换为两个字符串列以从长扩展到宽。 R 中实现以下目标的最有效方法是什么:
示例数据:
data_sample <- data.frame(code=c(1,1,2,3,4,2,4,3),name=c("bill","bob","rob","max","mitch","john","bart","joe"),numberdata=c(100,400,300,-200,300,-500,100,-400))
生成以下数据集的所需函数:
data_desired <- data.frame(code=c(1,2,3,4),name1=c("bill","rob","max","mitch"),name2=c("bob","john","joe","bart"),numberdata1=c(100,300,-200,300),numberdata2=c(400,-500,-400,100))
我正在使用大数据(真实代码是1-100,000),有没有一种高效的data.table方法来完成这个?谢谢!
如果命名无关紧要(例如 name1、name2 等),您可以
- 为每个代码中的顺序观察创建一个新的分组变量运行
- 按本组拆分
- 创建一个主键用于合并所有结果data.tables
- 最终对结果执行键控合并
data_sample <- data.frame(code=c(1,1,2,3,4,2,4,3),name=c("bill","bob","rob","max","mitch","john","bart","joe"),numberdata=c(100,400,300,-200,300,-500,100,-400))
setDT(data_sample)
# Create new group indicating the variable "*group*"
data_sample[, `*group*` := seq.int(.N), by = code]
# Split the data.table according to the new group
groups <- split(data_sample, by = '*group*')
# Set key on each group to the "code" variable
lapply(groups, \(dt){
setkey(dt, code)
# Remove group
dt[, `*group*` := NULL]
})
# Merge the result (using Reduce here
res <- Reduce(\(dt1, dt2)merge(dt1, dt2, all = TRUE), groups)
# Reorder columns
setcolorder(res, sort(names(res)))
res
code name.x name.y numberdata.x numberdata.y
1: 1 bill bob 100 400
2: 2 rob john 300 -500
3: 3 max joe -200 -400
4: 4 mitch bart 300 100
这样做的好处当然是在每个代码有 2 个以上条目的情况下也可以使用。
您可以使用 dcast
-
library(data.table)
setDT(data_sample)
dcast(data_sample, code~rowid(code), value.var = c('name', 'numberdata'))
# code name_1 name_2 numberdata_1 numberdata_2
#1: 1 bill bob 100 400
#2: 2 rob john 300 -500
#3: 3 max joe -200 -400
#4: 4 mitch bart 300 100
我正在尝试将以下数据转换为两个字符串列以从长扩展到宽。 R 中实现以下目标的最有效方法是什么:
示例数据:
data_sample <- data.frame(code=c(1,1,2,3,4,2,4,3),name=c("bill","bob","rob","max","mitch","john","bart","joe"),numberdata=c(100,400,300,-200,300,-500,100,-400))
生成以下数据集的所需函数:
data_desired <- data.frame(code=c(1,2,3,4),name1=c("bill","rob","max","mitch"),name2=c("bob","john","joe","bart"),numberdata1=c(100,300,-200,300),numberdata2=c(400,-500,-400,100))
我正在使用大数据(真实代码是1-100,000),有没有一种高效的data.table方法来完成这个?谢谢!
如果命名无关紧要(例如 name1、name2 等),您可以
- 为每个代码中的顺序观察创建一个新的分组变量运行
- 按本组拆分
- 创建一个主键用于合并所有结果data.tables
- 最终对结果执行键控合并
data_sample <- data.frame(code=c(1,1,2,3,4,2,4,3),name=c("bill","bob","rob","max","mitch","john","bart","joe"),numberdata=c(100,400,300,-200,300,-500,100,-400))
setDT(data_sample)
# Create new group indicating the variable "*group*"
data_sample[, `*group*` := seq.int(.N), by = code]
# Split the data.table according to the new group
groups <- split(data_sample, by = '*group*')
# Set key on each group to the "code" variable
lapply(groups, \(dt){
setkey(dt, code)
# Remove group
dt[, `*group*` := NULL]
})
# Merge the result (using Reduce here
res <- Reduce(\(dt1, dt2)merge(dt1, dt2, all = TRUE), groups)
# Reorder columns
setcolorder(res, sort(names(res)))
res
code name.x name.y numberdata.x numberdata.y
1: 1 bill bob 100 400
2: 2 rob john 300 -500
3: 3 max joe -200 -400
4: 4 mitch bart 300 100
这样做的好处当然是在每个代码有 2 个以上条目的情况下也可以使用。
您可以使用 dcast
-
library(data.table)
setDT(data_sample)
dcast(data_sample, code~rowid(code), value.var = c('name', 'numberdata'))
# code name_1 name_2 numberdata_1 numberdata_2
#1: 1 bill bob 100 400
#2: 2 rob john 300 -500
#3: 3 max joe -200 -400
#4: 4 mitch bart 300 100