数据框的棘手重塑

Tricky reshaping of a data frame

我有一个包含过去 12 个月库存数据的数据框。我在下面创建了一个三个月的模拟数据框,它与我的数据集相似。

inventory <- data.frame(ID=c(1,1,1,1,2,2,3,3,3,3,4,4,4),
                        SKU=c("375F","375F","375F","375F","QX51","QX51","AEC","AEC","AEC","AEC","115332H","115332H","115332H"), 
                        inventory=c(3,4,14,5,18,5,4,13,4,10,3,2,2), 
                        sold=c(3,2,0,1,4,0,0,3,1,5,0,2,1), 
                        returned=c(1,0,2,0,0,0,1,0,1,1,0,2,0), 
                        month=c(0,1,2,3,0,2,3,0,1,2,3,2,3))

我正在尝试操作数据框以生成一个报告,该报告显示每个变量及其 ID 和 SKU 以及每个月的一列,如下图所示。

重塑数据框我尝试使用 dplyr 和 data.table 库,但没有取得任何成功。如何将数据转换为每个月都有一列,就像我发布的图片一样?我对 R 还是很陌生,所以请放轻松。谢谢

我们可以使用tidyr

library(dplyr)
library(tidyr)
gather(inventory, Variable,value, inventory:returned)  %>% #reshape to long
       mutate(month = paste0("Month", month)) %>% #concat with "Month" string
       spread(month, value)#reshape to wide
#   ID     SKU  Variable Month0 Month1 Month2 Month3
#1   1    375F inventory      3      4     14      5
#2   1    375F  returned      1      0      2      0
#3   1    375F      sold      3      2      0      1
#4   2    QX51 inventory     18     NA      5     NA
#5   2    QX51  returned      0     NA      0     NA
#6   2    QX51      sold      4     NA      0     NA
#7   3     AEC inventory     13      4     10      4
#8   3     AEC  returned      0      1      1      1
#9   3     AEC      sold      3      1      5      0
#10  4 115332H inventory     NA      3      2      2
#11  4 115332H  returned     NA      0      2      0
#12  4 115332H      sold     NA      0      2      1

ID = 4 and SKU = 115332H 有重复项,因此我必须更改值以删除重复项。

# Creating the data frame
inventory <- data.frame(ID=c(1,1,1,1,2,2,3,3,3,3,4,4,4), 
                        SKU=c("375F","375F","375F","375F","QX51","QX51","AEC","AEC","AEC","AEC","115332H","115332H","115332H"), 
                        inventory=c(3,4,14,5,18,5,4,13,4,10,3,2,2), sold=c(3,2,0,1,4,0,0,3,1,5,0,2,1), 
                        returned=c(1,0,2,0,0,0,1,0,1,1,0,2,0), 
                        month=c(0,1,2,3,0,2,3,0,1,2,1,2,3))

# Reshaping the data
  # Melting the data frame
  inv2 <- melt(inventory,id=c("ID","SKU","month"))
  # Reshaping
  inv2_wide <- reshape(inv2,v.names = "value",idvar = c("ID","SKU","variable"),
                       timevar = "month", direction = "wide")

# Ordering by ID variables
inv2_wide <- inv2_wide[order(inv2_wide$ID,inv2_wide$SKU),]

# Renaming the variables
names(inv2_wide) <- gsub("value\.","Month",names(inv2_wide))


   ID     SKU  variable Month0 Month1 Month2 Month3
1   1    375F inventory      3      4     14      5
14  1    375F      sold      3      2      0      1
27  1    375F  returned      1      0      2      0
5   2    QX51 inventory     18     NA      5     NA
18  2    QX51      sold      4     NA      0     NA
31  2    QX51  returned      0     NA      0     NA
7   3     AEC inventory     13      4     10      4
20  3     AEC      sold      3      1      5      0
33  3     AEC  returned      0      1      1      1
11  4 115332H inventory     NA      3      2      2
24  4 115332H      sold     NA      0      2      1
37  4 115332H  returned     NA      0      2      0