r 收集多元时间序列

Question

我有一个要转换为时间序列的数据框。问题是每个日期我都有多种产品。看起来像

    Name_Article Week Num_Any Spending Unit_Price
1      Product_A   1    2016   196.05     3.376000
2      Product_B   1    2016   377.04     1.004867
3      Product_A   2    2016  2979.40     3.376000
4      Product_C   2    2016   353.44     3.034444
5      Product_D   2    2016   160.99     0.653621
6      Product_E   2    2016   950.22     1.441164
7      Product_A   3    2016   196.05     3.376000
8      Product_B   3    2016   377.04     1.004867
9      Product_D   3    2016  2979.40     0.653621
10     Product_E   3    2016   353.44     1.441164
11     Product_A   4    2016   160.99     3.376000
12     Product_B   4    2016   950.22     1.441164

我知道使用每周时间序列不是最佳选择，但我别无选择。我的想法是得到类似

的东西

  Week Spending.A UnitPrice.A Spending.B UnitPrice.B Spending.C UnitPrice.C ...
    1      196.05    3.376000     377.04    1.004867        0.00   3.034444
    2     2979.40    3.376000       0.00    1.004867      353.44   3.034444
    3      120.05    3.376000     377.04    1.004867        0.00   3.950000
    4      160.99    3.500000     950.22    1.441164    ...

我无法理解 tydir 函数 gather() 和 spread()。任何帮助将不胜感激！

如果您想知道，所有这一切的目标是执行分层预测，但在我开始之前，我需要对我的数据进行结构化。

非常感谢！

Answer 1

删除 Name_Article 中下划线之前的所有内容并删除年份列。然后将其作为动物园对象读取 Name_Article 并指定 Week 是索引。

如果您需要特定的形式，则可以进行各种转换，例如 as.ts(z)、fortify.zoo(z)、coredata(z)、index(z) 和 zoo(coredata(z))。

library(zoo)

DF2 <- transform(DF, Name_Article = sub(".*_", "", Name_Article))[-3]
z <- read.zoo(DF2, index = "Week", split = "Name_Article", FUN = identity)

给予：

> z
  Spending.A Unit_Price.A Spending.B Unit_Price.B Spending.C Unit_Price.C
1     196.05        3.376     377.04     1.004867         NA           NA
2    2979.40        3.376         NA           NA     353.44     3.034444
3     196.05        3.376     377.04     1.004867         NA           NA
4     160.99        3.376     950.22     1.441164         NA           NA
  Spending.D Unit_Price.D Spending.E Unit_Price.E
1         NA           NA         NA           NA
2     160.99     0.653621     950.22     1.441164
3    2979.40     0.653621     353.44     1.441164
4         NA           NA         NA           NA

备注

假定可重现形式的数据为：

Lines <- "
    Name_Article Week Num_Any Spending Unit_Price
1      Product_A   1    2016   196.05     3.376000
2      Product_B   1    2016   377.04     1.004867
3      Product_A   2    2016  2979.40     3.376000
4      Product_C   2    2016   353.44     3.034444
5      Product_D   2    2016   160.99     0.653621
6      Product_E   2    2016   950.22     1.441164
7      Product_A   3    2016   196.05     3.376000
8      Product_B   3    2016   377.04     1.004867
9      Product_D   3    2016  2979.40     0.653621
10     Product_E   3    2016   353.44     1.441164
11     Product_A   4    2016   160.99     3.376000
12     Product_B   4    2016   950.22     1.441164"
DF <- read.table(text = Lines, header = TRUE)

r 收集多元时间序列

r gathering multivariate time series

r

time-series

hierarchical

dataframe

备注