按 id 和变量类型重塑数据框
Reshape data frame by id and variable type
我在重新排列以下数据框时遇到问题:
dat1 <- data.frame(
id = rep(1, 4),
var = paste0(rep(c("firstName", "secondName"), each= 2), c(rep(1:2, 2))),
value = c(1:4)
)
dat2 <- data.frame(
id = rep(2,3),
var = paste0(rep(c("firstName", "secondName"), each= 2)[1:3], c(rep(1:2,
2))[1:3]),
value = c(5:7)
)
dat = rbind(dat1, dat2)
dat$type = gsub('[0-9]', '', dat$var)
# > dat
# id var value
# 1 1 firstName1 1
# 2 1 firstName2 2
# 3 1 secondName1 3
# 4 1 secondName2 4
# 5 2 firstName1 5
# 6 2 firstName2 6
# 7 2 secondName1 7
我想得到以下结果:
id firstName secondName
1 1 3
1 2 4
2 5 7
2 6 NA
我试过了unstack(dat, form = value ~ type)
但是没用。
问题已更新:
firstName1
应该对应于 secondName1
,所以如果我将 dat2 更改为
dat2 <- data.frame(id = rep(2,3),var =paste0(rep(c("firstName", "secondName"), each= 2)[2:4], c(rep(1:2, 2))[2:4]),value = c(5:7))
# > dat
# id var value type
# 1: 1 firstName1 1 firstName
# 2: 1 firstName2 2 firstName
# 3: 1 secondName1 3 secondName
# 4: 1 secondName2 4 secondName
# 5: 2 firstName2 5 firstName
# 6: 2 secondName1 6 secondName
# 7: 2 secondName2 7 secondName
对于id = 2,他的名字应该是c(NA, 6) 和c(5, 7)。那么如何处理这种情况呢?
尝试 dcast
:
res <- data.table::dcast(
dat,
id + substring(as.character(var), nchar(as.character(var))) ~ type,
value.var = 'value')
res[2] <- NULL
# > res
# id firstName secondName
# 1 1 1 3
# 2 1 2 4
# 3 2 5 7
# 4 2 6 NA
substring(as.character(var), nchar(as.character(var)))
用于获取第二列的最后一个字符作为组变量。
我认为更好的选择是使用 data.table
:
中的 rowid
-函数
library(data.table)
dcast(setDT(dat), id + rowid(type) ~ type, value.var = 'value')[, type := NULL][]
给出:
id firstName secondName
1: 1 1 3
2: 1 2 4
3: 2 5 7
4: 2 6 NA
更新后的问题:
setDT(dat)[, num := gsub('.*([0-9])', '\1', var)
][, dcast(.SD, id + num ~ type, value.var = 'value')
][, num := NULL][]
给出:
id firstName secondName
1: 1 1 3
2: 1 2 4
3: 2 NA 6
4: 2 5 7
和library(tidyr)
rbind(dat1,dat2) %>% separate(var,c("name","index"),"(?=\d+$)") %>%
spread(key=name,value=value)
结果
id index firstName secondName
1 1 1 1 3
2 1 2 2 4
3 2 1 5 7
4 2 2 6 NA
备注
如果要删除colindex
,请在最后添加%>% dplyr::select(-index)
。
我在重新排列以下数据框时遇到问题:
dat1 <- data.frame(
id = rep(1, 4),
var = paste0(rep(c("firstName", "secondName"), each= 2), c(rep(1:2, 2))),
value = c(1:4)
)
dat2 <- data.frame(
id = rep(2,3),
var = paste0(rep(c("firstName", "secondName"), each= 2)[1:3], c(rep(1:2,
2))[1:3]),
value = c(5:7)
)
dat = rbind(dat1, dat2)
dat$type = gsub('[0-9]', '', dat$var)
# > dat
# id var value
# 1 1 firstName1 1
# 2 1 firstName2 2
# 3 1 secondName1 3
# 4 1 secondName2 4
# 5 2 firstName1 5
# 6 2 firstName2 6
# 7 2 secondName1 7
我想得到以下结果:
id firstName secondName
1 1 3
1 2 4
2 5 7
2 6 NA
我试过了unstack(dat, form = value ~ type)
但是没用。
问题已更新:
firstName1
应该对应于 secondName1
,所以如果我将 dat2 更改为
dat2 <- data.frame(id = rep(2,3),var =paste0(rep(c("firstName", "secondName"), each= 2)[2:4], c(rep(1:2, 2))[2:4]),value = c(5:7))
# > dat
# id var value type
# 1: 1 firstName1 1 firstName
# 2: 1 firstName2 2 firstName
# 3: 1 secondName1 3 secondName
# 4: 1 secondName2 4 secondName
# 5: 2 firstName2 5 firstName
# 6: 2 secondName1 6 secondName
# 7: 2 secondName2 7 secondName
对于id = 2,他的名字应该是c(NA, 6) 和c(5, 7)。那么如何处理这种情况呢?
尝试 dcast
:
res <- data.table::dcast(
dat,
id + substring(as.character(var), nchar(as.character(var))) ~ type,
value.var = 'value')
res[2] <- NULL
# > res
# id firstName secondName
# 1 1 1 3
# 2 1 2 4
# 3 2 5 7
# 4 2 6 NA
substring(as.character(var), nchar(as.character(var)))
用于获取第二列的最后一个字符作为组变量。
我认为更好的选择是使用 data.table
:
rowid
-函数
library(data.table)
dcast(setDT(dat), id + rowid(type) ~ type, value.var = 'value')[, type := NULL][]
给出:
id firstName secondName 1: 1 1 3 2: 1 2 4 3: 2 5 7 4: 2 6 NA
更新后的问题:
setDT(dat)[, num := gsub('.*([0-9])', '\1', var)
][, dcast(.SD, id + num ~ type, value.var = 'value')
][, num := NULL][]
给出:
id firstName secondName 1: 1 1 3 2: 1 2 4 3: 2 NA 6 4: 2 5 7
和library(tidyr)
rbind(dat1,dat2) %>% separate(var,c("name","index"),"(?=\d+$)") %>%
spread(key=name,value=value)
结果
id index firstName secondName
1 1 1 1 3
2 1 2 2 4
3 2 1 5 7
4 2 2 6 NA
备注
如果要删除colindex
,请在最后添加%>% dplyr::select(-index)
。