由于日期重复，无法将零需求日期添加到动物园时间序列

Question

对于任何不遵守发布问题规则的行为，我深表歉意。下面的数据 table 是我要转换成时间序列的示例。

> Materials
MaterialID  Date     Quantity
   1      2011-01-04      13
   1      2011-01-04      5
   2      2011-01-07      9
   3      2011-01-09      3
   3      2011-01-11      10

它由几个 Material 项的交易条目组成，这些条目在 2011 年之间 - 2014.The 整个数据集的日期范围是从 2011 年 1 月 4 日到 2014 年 12 月 31 日。我想创建一个交易此期间内每个 material 的条目，同时通过将缺失日期的 Quantity 变量设置为零来计算缺失日期。换句话说，我希望的结果是，在 2011 年 1 月 4 日至 2014 年 12 月 31 日之间的每个日期，数据集中的每个 Material 都会有一个条目，如下所示：

   Date    MaterialID_1  MaterialID_2 MaterialID_3
2011-01-04    13               0          0
2011-01-04    5                0          0
2011-01-05    0                0          0
2011-01-06    0                0          0
2011-01-07    0                9          0
2011-01-08    0                0          0
2011-01-09    0                0          3
2011-01-10    0                0          10
2011-01-11    0                0          0
    .         .                .          .
    .         .                .          .
    .         .                .          .
2014-12-31    0                0          0

我已经尝试了一些我在论坛上看到的方法，例如 Add months of zero demand to zoo time series，但是因为我有重复的日期，所以我得到了错误，"index entries in ‘order.by’ are not unique"。如果我能得到任何建议或帮助，我将不胜感激。

将数据转换成这种格式后，我的目的是重塑数据集以进行批量预测。谢谢

查看下面的输入代码：

dput(Data)
structure(list(MaterialID = c(1L, 1L, 2L, 3L, 1L), Date = c("2011-01-04", 
"2011-01-04", "2011-01-07", "2011-01-09", "2011-01-11"), Quantity = c(13L, 
5L, 9L, 3L, 10L)), .Names = c("MaterialID", "Date", "Quantity"
), class = "data.frame", row.names = c(NA, -5L))

Answer 1

我正在使用 expand.grid 获取所有组合，然后使用 merge()。我在这里使用随机数据

df <- data.frame(materialid = rpois(10, 3), date = as.Date(seq(1, 365 * 4, length.out = 10), origin = '2011-01-01'), quantity = rpois(10, 100))

df2 <- expand.grid(unique(df$materialid), as.Date(min(df$date):max(df$date), origin = '1970-01-01'))
names(df2) <- c('materialid', 'date')

df2 <- merge(df2, df, by = c('materialid', 'date'), all.x = T)
df2$quantity[is.na(df2$quantity)] <- 0
summary(df2)

Answer 2

您可以使用 xts 对象通过拆分-应用-组合操作来完成此操作。与 zoo 不同，xts 对象允许重复索引。

# sample data
Data <- read.csv(text = "MaterialID,Date,Quantity
1,2011-01-04,13
1,2011-01-04,5
1,2011-05-06,9
1,2011-08-07,3
1,2011-12-08,10
2,2011-03-09,4
3,2011-02-10,7
3,2011-10-11,78
3,2014-31-12,32", as.is = TRUE)
# split data into groups by material id
dataByMaterialId <- split(Data, Data$MaterialID)
# create an xts object for each id
xts_list <- lapply(dataByMaterialId, function(id) {
  names <- list(NULL, paste0("Qty.", id$MaterialID[1]))
  xts(id$Quantity, as.Date(id$Date, "%Y-%d-%m"), dimnames = names)
})
# use do.call + merge to combine all your xts objects into one object
xts_merged <- do.call(merge, c(xts_list, fill = 0)())
#            Qty.1 Qty.2 Qty.3
# 2011-04-01    13     0     0
# 2011-04-01     5     0     0
# 2011-06-05     9     0     0
# 2011-07-08     3     0     0
# 2011-08-12    10     0     0
# 2011-09-03     0     4     0
# 2011-10-02     0     0     7
# 2011-11-10     0     0    78
# 2014-12-31     0     0    32

由于日期重复，无法将零需求日期添加到动物园时间序列

Cannot add dates of zero demand to zoo time series due to duplicate dates

r

time-series

zoo