将外部回归量纳入分层/分组时间序列
Incorporating external regressor in a hierarchical/ grouped time series
各位贡献者,
我一直在处理分层时间序列,它与许多商店中的一组相同产品有关。为此,在我的例子中,当我们基于“商店”和“product_type”等 2 个属性聚合数据集时,我们应该聚合目标变量,即每个组或层次结构的每个产品的“需求” .
我想做的是在我的模型中添加另一个分类变量,让我们说“动态谐波回归”,因为我正在使用每周时间序列。但是,当我的外部变量是具有 4 个级别的分类变量时,我不知道应该如何包含它。我想知道我如何汇总这个或者我是否可以做些什么。
在这里您可以找到一个可重现的小例子:
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(fable)
library(fabletools)
library(fpp3)
library(readxl)
library(fable.prophet)
library(feasts)
store <- c(rep('st1', 8), rep('st2', 8))
product_type <- c(rep('type1', 4), rep('type2', 4), rep('type1', 4), rep('type2', 4))
products <- c(rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2),
rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2))
demands <- c(round(sample(c(1:100), 16, replace = TRUE)))
external_reg <- c(sample(c('red', 'green', 'blue'), 16, replace = TRUE))
date_week <- rep(1:4, 4)
date_year <- rep(2019:2022, 4)
my_data <- tibble(date_year, date_week, store, product_type, products, demands, external_reg)
my_data %>%
mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
mutate(Week = yearweek(Date)) %>%
as_tsibble(key = c(store, product_type), index = Week) %>%
aggregate_key(store * product_type, Demand_Agg = sum(demands))
很明显,外部回归器应该是我的 tsibble
:
中的一列
# A tsibble: 36 x 4 [53W]
# Key: store, product_type [9]
Week store product_type Demand_Agg
<week> <chr*> <chr*> <dbl>
1 2019 W01 <aggregated> <aggregated> 188
2 2020 W02 <aggregated> <aggregated> 142
3 2021 W02 <aggregated> <aggregated> 259
4 2022 W03 <aggregated> <aggregated> 186
5 2019 W01 st1 <aggregated> 89
6 2019 W01 st2 <aggregated> 99
7 2020 W02 st1 <aggregated> 52
8 2020 W02 st2 <aggregated> 90
9 2021 W02 st1 <aggregated> 95
10 2021 W02 st2 <aggregated> 164
# … with 26 more rows
非常感谢您。
外部回归量列 (external_reg
) 已从您的输出中删除,因为您尚未指定应如何聚合它。鉴于它是一个离散变量,以保留此信息的方式聚合数据可能很棘手。您选择如何聚合这取决于您,并且可能取决于您要使用的模型。如果您有像温度这样的连续变量,您可能需要计算平均温度。
例如,如果您想保留 external_reg
的第一个值,您可以将它与 aggregate_key(<tsibble>, store * product_type, Demand_Agg = sum(demands), external_reg = first(external_reg))
聚合
library(dplyr)
library(fable)
library(tsibble)
library(lubridate)
my_data <- structure(list(date_year = c(2019L, 2020L, 2021L, 2022L, 2019L,
2020L, 2021L, 2022L, 2019L, 2020L, 2021L, 2022L, 2019L, 2020L,
2021L, 2022L), date_week = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), store = c("st1", "st1", "st1",
"st1", "st1", "st1", "st1", "st1", "st2", "st2", "st2", "st2",
"st2", "st2", "st2", "st2"), product_type = c("type1", "type1",
"type1", "type1", "type2", "type2", "type2", "type2", "type1",
"type1", "type1", "type1", "type2", "type2", "type2", "type2"
), products = c("A", "A", "B", "B", "C", "C", "D", "D", "A",
"A", "B", "B", "C", "C", "D", "D"), demands = c(45, 12, 70, 66,
77, 6, 27, 52, 8, 73, 70, 27, 84, 100, 79, 51), external_reg = c("blue",
"green", "red", "blue", "green", "blue", "red", "green", "blue",
"blue", "green", "green", "red", "green", "blue", "green")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -16L))
my_data %>%
mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
mutate(Week = yearweek(Date)) %>%
as_tsibble(key = c(store, product_type), index = Week) %>%
aggregate_key(store * product_type, Demand_Agg = sum(demands), external_reg = first(external_reg))
#> # A tsibble: 36 x 5 [53W]
#> # Key: store, product_type [9]
#> Week store product_type Demand_Agg external_reg
#> <week> <chr*> <chr*> <dbl> <chr>
#> 1 2019 W01 <aggregated> <aggregated> 214 blue
#> 2 2020 W02 <aggregated> <aggregated> 191 green
#> 3 2021 W02 <aggregated> <aggregated> 246 red
#> 4 2022 W03 <aggregated> <aggregated> 196 blue
#> 5 2019 W01 st1 <aggregated> 122 blue
#> 6 2019 W01 st2 <aggregated> 92 blue
#> 7 2020 W02 st1 <aggregated> 18 green
#> 8 2020 W02 st2 <aggregated> 173 blue
#> 9 2021 W02 st1 <aggregated> 97 red
#> 10 2021 W02 st2 <aggregated> 149 green
#> # … with 26 more rows
由 reprex package (v2.0.1)
于 2022-05-08 创建
各位贡献者,
我一直在处理分层时间序列,它与许多商店中的一组相同产品有关。为此,在我的例子中,当我们基于“商店”和“product_type”等 2 个属性聚合数据集时,我们应该聚合目标变量,即每个组或层次结构的每个产品的“需求” . 我想做的是在我的模型中添加另一个分类变量,让我们说“动态谐波回归”,因为我正在使用每周时间序列。但是,当我的外部变量是具有 4 个级别的分类变量时,我不知道应该如何包含它。我想知道我如何汇总这个或者我是否可以做些什么。 在这里您可以找到一个可重现的小例子:
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(fable)
library(fabletools)
library(fpp3)
library(readxl)
library(fable.prophet)
library(feasts)
store <- c(rep('st1', 8), rep('st2', 8))
product_type <- c(rep('type1', 4), rep('type2', 4), rep('type1', 4), rep('type2', 4))
products <- c(rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2),
rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2))
demands <- c(round(sample(c(1:100), 16, replace = TRUE)))
external_reg <- c(sample(c('red', 'green', 'blue'), 16, replace = TRUE))
date_week <- rep(1:4, 4)
date_year <- rep(2019:2022, 4)
my_data <- tibble(date_year, date_week, store, product_type, products, demands, external_reg)
my_data %>%
mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
mutate(Week = yearweek(Date)) %>%
as_tsibble(key = c(store, product_type), index = Week) %>%
aggregate_key(store * product_type, Demand_Agg = sum(demands))
很明显,外部回归器应该是我的 tsibble
:
# A tsibble: 36 x 4 [53W]
# Key: store, product_type [9]
Week store product_type Demand_Agg
<week> <chr*> <chr*> <dbl>
1 2019 W01 <aggregated> <aggregated> 188
2 2020 W02 <aggregated> <aggregated> 142
3 2021 W02 <aggregated> <aggregated> 259
4 2022 W03 <aggregated> <aggregated> 186
5 2019 W01 st1 <aggregated> 89
6 2019 W01 st2 <aggregated> 99
7 2020 W02 st1 <aggregated> 52
8 2020 W02 st2 <aggregated> 90
9 2021 W02 st1 <aggregated> 95
10 2021 W02 st2 <aggregated> 164
# … with 26 more rows
非常感谢您。
外部回归量列 (external_reg
) 已从您的输出中删除,因为您尚未指定应如何聚合它。鉴于它是一个离散变量,以保留此信息的方式聚合数据可能很棘手。您选择如何聚合这取决于您,并且可能取决于您要使用的模型。如果您有像温度这样的连续变量,您可能需要计算平均温度。
例如,如果您想保留 external_reg
的第一个值,您可以将它与 aggregate_key(<tsibble>, store * product_type, Demand_Agg = sum(demands), external_reg = first(external_reg))
library(dplyr)
library(fable)
library(tsibble)
library(lubridate)
my_data <- structure(list(date_year = c(2019L, 2020L, 2021L, 2022L, 2019L,
2020L, 2021L, 2022L, 2019L, 2020L, 2021L, 2022L, 2019L, 2020L,
2021L, 2022L), date_week = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), store = c("st1", "st1", "st1",
"st1", "st1", "st1", "st1", "st1", "st2", "st2", "st2", "st2",
"st2", "st2", "st2", "st2"), product_type = c("type1", "type1",
"type1", "type1", "type2", "type2", "type2", "type2", "type1",
"type1", "type1", "type1", "type2", "type2", "type2", "type2"
), products = c("A", "A", "B", "B", "C", "C", "D", "D", "A",
"A", "B", "B", "C", "C", "D", "D"), demands = c(45, 12, 70, 66,
77, 6, 27, 52, 8, 73, 70, 27, 84, 100, 79, 51), external_reg = c("blue",
"green", "red", "blue", "green", "blue", "red", "green", "blue",
"blue", "green", "green", "red", "green", "blue", "green")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -16L))
my_data %>%
mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
mutate(Week = yearweek(Date)) %>%
as_tsibble(key = c(store, product_type), index = Week) %>%
aggregate_key(store * product_type, Demand_Agg = sum(demands), external_reg = first(external_reg))
#> # A tsibble: 36 x 5 [53W]
#> # Key: store, product_type [9]
#> Week store product_type Demand_Agg external_reg
#> <week> <chr*> <chr*> <dbl> <chr>
#> 1 2019 W01 <aggregated> <aggregated> 214 blue
#> 2 2020 W02 <aggregated> <aggregated> 191 green
#> 3 2021 W02 <aggregated> <aggregated> 246 red
#> 4 2022 W03 <aggregated> <aggregated> 196 blue
#> 5 2019 W01 st1 <aggregated> 122 blue
#> 6 2019 W01 st2 <aggregated> 92 blue
#> 7 2020 W02 st1 <aggregated> 18 green
#> 8 2020 W02 st2 <aggregated> 173 blue
#> 9 2021 W02 st1 <aggregated> 97 red
#> 10 2021 W02 st2 <aggregated> 149 green
#> # … with 26 more rows
由 reprex package (v2.0.1)
于 2022-05-08 创建