数据框的棘手重塑
Tricky reshaping of a data frame
我有一个包含过去 12 个月库存数据的数据框。我在下面创建了一个三个月的模拟数据框,它与我的数据集相似。
inventory <- data.frame(ID=c(1,1,1,1,2,2,3,3,3,3,4,4,4),
SKU=c("375F","375F","375F","375F","QX51","QX51","AEC","AEC","AEC","AEC","115332H","115332H","115332H"),
inventory=c(3,4,14,5,18,5,4,13,4,10,3,2,2),
sold=c(3,2,0,1,4,0,0,3,1,5,0,2,1),
returned=c(1,0,2,0,0,0,1,0,1,1,0,2,0),
month=c(0,1,2,3,0,2,3,0,1,2,3,2,3))
我正在尝试操作数据框以生成一个报告,该报告显示每个变量及其 ID 和 SKU 以及每个月的一列,如下图所示。
重塑数据框我尝试使用 dplyr 和 data.table 库,但没有取得任何成功。如何将数据转换为每个月都有一列,就像我发布的图片一样?我对 R 还是很陌生,所以请放轻松。谢谢
我们可以使用tidyr
library(dplyr)
library(tidyr)
gather(inventory, Variable,value, inventory:returned) %>% #reshape to long
mutate(month = paste0("Month", month)) %>% #concat with "Month" string
spread(month, value)#reshape to wide
# ID SKU Variable Month0 Month1 Month2 Month3
#1 1 375F inventory 3 4 14 5
#2 1 375F returned 1 0 2 0
#3 1 375F sold 3 2 0 1
#4 2 QX51 inventory 18 NA 5 NA
#5 2 QX51 returned 0 NA 0 NA
#6 2 QX51 sold 4 NA 0 NA
#7 3 AEC inventory 13 4 10 4
#8 3 AEC returned 0 1 1 1
#9 3 AEC sold 3 1 5 0
#10 4 115332H inventory NA 3 2 2
#11 4 115332H returned NA 0 2 0
#12 4 115332H sold NA 0 2 1
ID = 4 and SKU = 115332H
有重复项,因此我必须更改值以删除重复项。
# Creating the data frame
inventory <- data.frame(ID=c(1,1,1,1,2,2,3,3,3,3,4,4,4),
SKU=c("375F","375F","375F","375F","QX51","QX51","AEC","AEC","AEC","AEC","115332H","115332H","115332H"),
inventory=c(3,4,14,5,18,5,4,13,4,10,3,2,2), sold=c(3,2,0,1,4,0,0,3,1,5,0,2,1),
returned=c(1,0,2,0,0,0,1,0,1,1,0,2,0),
month=c(0,1,2,3,0,2,3,0,1,2,1,2,3))
# Reshaping the data
# Melting the data frame
inv2 <- melt(inventory,id=c("ID","SKU","month"))
# Reshaping
inv2_wide <- reshape(inv2,v.names = "value",idvar = c("ID","SKU","variable"),
timevar = "month", direction = "wide")
# Ordering by ID variables
inv2_wide <- inv2_wide[order(inv2_wide$ID,inv2_wide$SKU),]
# Renaming the variables
names(inv2_wide) <- gsub("value\.","Month",names(inv2_wide))
ID SKU variable Month0 Month1 Month2 Month3
1 1 375F inventory 3 4 14 5
14 1 375F sold 3 2 0 1
27 1 375F returned 1 0 2 0
5 2 QX51 inventory 18 NA 5 NA
18 2 QX51 sold 4 NA 0 NA
31 2 QX51 returned 0 NA 0 NA
7 3 AEC inventory 13 4 10 4
20 3 AEC sold 3 1 5 0
33 3 AEC returned 0 1 1 1
11 4 115332H inventory NA 3 2 2
24 4 115332H sold NA 0 2 1
37 4 115332H returned NA 0 2 0
我有一个包含过去 12 个月库存数据的数据框。我在下面创建了一个三个月的模拟数据框,它与我的数据集相似。
inventory <- data.frame(ID=c(1,1,1,1,2,2,3,3,3,3,4,4,4),
SKU=c("375F","375F","375F","375F","QX51","QX51","AEC","AEC","AEC","AEC","115332H","115332H","115332H"),
inventory=c(3,4,14,5,18,5,4,13,4,10,3,2,2),
sold=c(3,2,0,1,4,0,0,3,1,5,0,2,1),
returned=c(1,0,2,0,0,0,1,0,1,1,0,2,0),
month=c(0,1,2,3,0,2,3,0,1,2,3,2,3))
我正在尝试操作数据框以生成一个报告,该报告显示每个变量及其 ID 和 SKU 以及每个月的一列,如下图所示。
重塑数据框我尝试使用 dplyr 和 data.table 库,但没有取得任何成功。如何将数据转换为每个月都有一列,就像我发布的图片一样?我对 R 还是很陌生,所以请放轻松。谢谢
我们可以使用tidyr
library(dplyr)
library(tidyr)
gather(inventory, Variable,value, inventory:returned) %>% #reshape to long
mutate(month = paste0("Month", month)) %>% #concat with "Month" string
spread(month, value)#reshape to wide
# ID SKU Variable Month0 Month1 Month2 Month3
#1 1 375F inventory 3 4 14 5
#2 1 375F returned 1 0 2 0
#3 1 375F sold 3 2 0 1
#4 2 QX51 inventory 18 NA 5 NA
#5 2 QX51 returned 0 NA 0 NA
#6 2 QX51 sold 4 NA 0 NA
#7 3 AEC inventory 13 4 10 4
#8 3 AEC returned 0 1 1 1
#9 3 AEC sold 3 1 5 0
#10 4 115332H inventory NA 3 2 2
#11 4 115332H returned NA 0 2 0
#12 4 115332H sold NA 0 2 1
ID = 4 and SKU = 115332H
有重复项,因此我必须更改值以删除重复项。
# Creating the data frame
inventory <- data.frame(ID=c(1,1,1,1,2,2,3,3,3,3,4,4,4),
SKU=c("375F","375F","375F","375F","QX51","QX51","AEC","AEC","AEC","AEC","115332H","115332H","115332H"),
inventory=c(3,4,14,5,18,5,4,13,4,10,3,2,2), sold=c(3,2,0,1,4,0,0,3,1,5,0,2,1),
returned=c(1,0,2,0,0,0,1,0,1,1,0,2,0),
month=c(0,1,2,3,0,2,3,0,1,2,1,2,3))
# Reshaping the data
# Melting the data frame
inv2 <- melt(inventory,id=c("ID","SKU","month"))
# Reshaping
inv2_wide <- reshape(inv2,v.names = "value",idvar = c("ID","SKU","variable"),
timevar = "month", direction = "wide")
# Ordering by ID variables
inv2_wide <- inv2_wide[order(inv2_wide$ID,inv2_wide$SKU),]
# Renaming the variables
names(inv2_wide) <- gsub("value\.","Month",names(inv2_wide))
ID SKU variable Month0 Month1 Month2 Month3
1 1 375F inventory 3 4 14 5
14 1 375F sold 3 2 0 1
27 1 375F returned 1 0 2 0
5 2 QX51 inventory 18 NA 5 NA
18 2 QX51 sold 4 NA 0 NA
31 2 QX51 returned 0 NA 0 NA
7 3 AEC inventory 13 4 10 4
20 3 AEC sold 3 1 5 0
33 3 AEC returned 0 1 1 1
11 4 115332H inventory NA 3 2 2
24 4 115332H sold NA 0 2 1
37 4 115332H returned NA 0 2 0