从长格式转换为宽格式,每次重复创建一个新行
Casting from Long to Wide format, with each repeat creating a new row
我有一个长格式的数据框,我想将其转换为宽格式。数据框有几个重复的标识符,我想将它们视为唯一实例,并将它们表示为宽数据框中的单独行。
我的问题和这个类似:
Forcing unique values before casting (pivoting) in R
但在上述问题中,唯一条目最终作为单独的列结束。对于我的问题,我想将数据放入单独的行中。例如:
ID1<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
ID2<-c("R","R","R","L","L","R","R","L","L","R","R","L","L","R","R")
Sp<-c("Bird","Cat","Bird","Bird","Dog","Dog","Dog","Cat","Cat","Bird","Cat","Dog","Bird","Bird","Cat")
Count<-c(1,2,2,1,2,1,2,3,2,1,2,3,2,1,5)
DF<-data.frame(ID1,ID2,Sp,Count)
将数据转换为宽格式后,我希望输出数据如下所示:
ID1 ID2 Bird Cat Dog
A R 1 2 0
A R 2 0 0 # 2 Birds in the A/ R combination so need second row (don't want to add them together)
A L 1 0 2
B R 1 0 1
B R 0 0 2
B L 0 3 0
B L 0 2 0
C R 1 2 0
C R 0 5 0
C L 2 0 3
如果唯一 ID1/ID2 组合中没有重复,则转换将正常进行。但如果有重复,则会创建第二(或第三或第四)行。
您可以为每组 ID1
、ID2
和 Sp
创建一个辅助 ID 列,然后用 ID1
、ID2
和 AUXID
作为 id 列:
library(dplyr)
DF = DF %>% group_by(ID1, ID2, Sp) %>% mutate(AUXID = row_number()) %>% as.data.frame()
reshape(DF, idvar = c("ID1", "ID2", "AUXID"), timevar = "Sp", dir = "wide")
# ID1 ID2 AUXID Count.Bird Count.Cat Count.Dog
# 1 A R 1 1 2 NA
# 3 A R 2 2 NA NA
# 4 A L 1 1 NA 2
# 6 B R 1 1 NA 1
# 7 B R 2 NA NA 2
# 8 B L 1 NA 3 NA
# 9 B L 2 NA 2 NA
# 11 C R 1 1 2 NA
# 12 C L 1 2 NA 3
# 15 C R 2 NA 5 NA
您可以删除 AUXID
列,然后填写 NA
。
这是一个带有 dcast()
的 data.table 版本,它提供了一个 fill
参数来填充 NA 值:
library(data.table)
(dcast(setDT(DF)[, AUXID := 1:.N, .(ID1, ID2, Sp)],
ID1 + ID2 + AUXID ~ Sp, value.var = "Count", fill = 0)
[, AUXID := NULL][])
# ID1 ID2 Bird Cat Dog
# 1: A L 1 0 2
# 2: A R 1 2 0
# 3: A R 2 0 0
# 4: B L 0 3 0
# 5: B L 0 2 0
# 6: B R 1 0 1
# 7: B R 0 0 2
# 8: C L 2 0 3
# 9: C R 1 2 0
#10: C R 0 5 0
我有一个长格式的数据框,我想将其转换为宽格式。数据框有几个重复的标识符,我想将它们视为唯一实例,并将它们表示为宽数据框中的单独行。
我的问题和这个类似:
Forcing unique values before casting (pivoting) in R
但在上述问题中,唯一条目最终作为单独的列结束。对于我的问题,我想将数据放入单独的行中。例如:
ID1<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")
ID2<-c("R","R","R","L","L","R","R","L","L","R","R","L","L","R","R")
Sp<-c("Bird","Cat","Bird","Bird","Dog","Dog","Dog","Cat","Cat","Bird","Cat","Dog","Bird","Bird","Cat")
Count<-c(1,2,2,1,2,1,2,3,2,1,2,3,2,1,5)
DF<-data.frame(ID1,ID2,Sp,Count)
将数据转换为宽格式后,我希望输出数据如下所示:
ID1 ID2 Bird Cat Dog
A R 1 2 0
A R 2 0 0 # 2 Birds in the A/ R combination so need second row (don't want to add them together)
A L 1 0 2
B R 1 0 1
B R 0 0 2
B L 0 3 0
B L 0 2 0
C R 1 2 0
C R 0 5 0
C L 2 0 3
如果唯一 ID1/ID2 组合中没有重复,则转换将正常进行。但如果有重复,则会创建第二(或第三或第四)行。
您可以为每组 ID1
、ID2
和 Sp
创建一个辅助 ID 列,然后用 ID1
、ID2
和 AUXID
作为 id 列:
library(dplyr)
DF = DF %>% group_by(ID1, ID2, Sp) %>% mutate(AUXID = row_number()) %>% as.data.frame()
reshape(DF, idvar = c("ID1", "ID2", "AUXID"), timevar = "Sp", dir = "wide")
# ID1 ID2 AUXID Count.Bird Count.Cat Count.Dog
# 1 A R 1 1 2 NA
# 3 A R 2 2 NA NA
# 4 A L 1 1 NA 2
# 6 B R 1 1 NA 1
# 7 B R 2 NA NA 2
# 8 B L 1 NA 3 NA
# 9 B L 2 NA 2 NA
# 11 C R 1 1 2 NA
# 12 C L 1 2 NA 3
# 15 C R 2 NA 5 NA
您可以删除 AUXID
列,然后填写 NA
。
这是一个带有 dcast()
的 data.table 版本,它提供了一个 fill
参数来填充 NA 值:
library(data.table)
(dcast(setDT(DF)[, AUXID := 1:.N, .(ID1, ID2, Sp)],
ID1 + ID2 + AUXID ~ Sp, value.var = "Count", fill = 0)
[, AUXID := NULL][])
# ID1 ID2 Bird Cat Dog
# 1: A L 1 0 2
# 2: A R 1 2 0
# 3: A R 2 0 0
# 4: B L 0 3 0
# 5: B L 0 2 0
# 6: B R 1 0 1
# 7: B R 0 0 2
# 8: C L 2 0 3
# 9: C R 1 2 0
#10: C R 0 5 0