为什么 `fable::TSLM()` 的预测值比 `stats::lm()` 略低?
Why are forecast values slightly lower with `fable::TSLM()` than `stats::lm()`?
我正在做一些涉及随时间对值进行建模的工作,为了清楚起见,我想使用 fable
包来完成这项工作。我想通过对数转换创建一个随时间变化的线性模型 - 但是,我发现 fable::TSLM()
生成的值在某些情况下与之前使用的 stats::lm()
生成的值有很大不同在模型中。这个问题可能是我对 fable
函数的错误使用引起的,但也可能是包中的错误。以下代表说明了我的问题:
library(tsibble)
library(fable)
library(dplyr)
library(tidyr) # Not essential
library(ggplot2) # Not essential
# Create a toy dataset
test_data <- tsibble(
Month = yearmonth("2020 Jan") + 0:11,
# Month_Number will be used to fit a `stats` style model
Month_Number = 1:12,
Value = c(100, 95, 91, 75, 89, 85, 82, 75, 62, 57, 58, 50),
index = Month
)
# Create a `fable` style model
fable_model <- test_data %>%
fabletools::model(m = TSLM(log(Value) ~ trend()))
# Generate modelled values using `fable`
modelled_values <- fable_model %>%
augment() %>%
mutate(Type = "Modelled") %>%
rename(Fable_Model = .fitted, Actual_Value = Value) %>%
select(-.resid) %>%
as_tsibble()
# generate forecasted values using `fable`
future_values <- fable_model %>%
forecast(h = 12, point_forecast = list(Fable_Model = mean)) %>%
mutate(Type = "Forecast") %>%
as_tsibble() %>%
select(-Value)
# Generate a `stats` style model
exp_model <- lm(log(Value) ~ Month_Number, data = test_data)
# Bind the modelled and forecast `fable` values together
all_values <- bind_rows(modelled_values, future_values) %>%
# Mutate a column of `stats` predicted values
mutate(Stats_Model = exp(predict(exp_model, newdata = tibble(Month_Number = 1:24))))
# Check out the mean difference in predictions - these are negligible for modelled values but are
# much more significant for forecasted values.
all_values %>%
as_tibble() %>%
group_by(Type) %>%
summarise(Mean_Difference = mean(abs(Fable_Model - Stats_Model)), .groups = "drop")
#> # A tibble: 2 x 2
#> Type Mean_Difference
#> <chr> <dbl>
#> 1 Forecast 2.91e- 1
#> 2 Modelled 3.79e-14
# Can also visualise the differences with this code:
all_values %>%
pivot_longer(c(Actual_Value, Fable_Model, Stats_Model), names_to = "Series", values_to = "Value") %>%
ggplot(aes(x = as_date(Month), y = Value, colour = Series)) +
geom_line()
Created on 2020-12-10 by the reprex package (v0.3.0)
如 link 所述,fable 包中的转换数据进行了一些修正,以生成均值而不是中位数。
我认为它来自于此,因为您使用修改残差法则的对数变换。
请注意,如果您使用 point_forecast = list(Fable_Model = median)
,两个模型都会给出相同的结果。
所以我猜寓言是对的
我正在做一些涉及随时间对值进行建模的工作,为了清楚起见,我想使用 fable
包来完成这项工作。我想通过对数转换创建一个随时间变化的线性模型 - 但是,我发现 fable::TSLM()
生成的值在某些情况下与之前使用的 stats::lm()
生成的值有很大不同在模型中。这个问题可能是我对 fable
函数的错误使用引起的,但也可能是包中的错误。以下代表说明了我的问题:
library(tsibble)
library(fable)
library(dplyr)
library(tidyr) # Not essential
library(ggplot2) # Not essential
# Create a toy dataset
test_data <- tsibble(
Month = yearmonth("2020 Jan") + 0:11,
# Month_Number will be used to fit a `stats` style model
Month_Number = 1:12,
Value = c(100, 95, 91, 75, 89, 85, 82, 75, 62, 57, 58, 50),
index = Month
)
# Create a `fable` style model
fable_model <- test_data %>%
fabletools::model(m = TSLM(log(Value) ~ trend()))
# Generate modelled values using `fable`
modelled_values <- fable_model %>%
augment() %>%
mutate(Type = "Modelled") %>%
rename(Fable_Model = .fitted, Actual_Value = Value) %>%
select(-.resid) %>%
as_tsibble()
# generate forecasted values using `fable`
future_values <- fable_model %>%
forecast(h = 12, point_forecast = list(Fable_Model = mean)) %>%
mutate(Type = "Forecast") %>%
as_tsibble() %>%
select(-Value)
# Generate a `stats` style model
exp_model <- lm(log(Value) ~ Month_Number, data = test_data)
# Bind the modelled and forecast `fable` values together
all_values <- bind_rows(modelled_values, future_values) %>%
# Mutate a column of `stats` predicted values
mutate(Stats_Model = exp(predict(exp_model, newdata = tibble(Month_Number = 1:24))))
# Check out the mean difference in predictions - these are negligible for modelled values but are
# much more significant for forecasted values.
all_values %>%
as_tibble() %>%
group_by(Type) %>%
summarise(Mean_Difference = mean(abs(Fable_Model - Stats_Model)), .groups = "drop")
#> # A tibble: 2 x 2
#> Type Mean_Difference
#> <chr> <dbl>
#> 1 Forecast 2.91e- 1
#> 2 Modelled 3.79e-14
# Can also visualise the differences with this code:
all_values %>%
pivot_longer(c(Actual_Value, Fable_Model, Stats_Model), names_to = "Series", values_to = "Value") %>%
ggplot(aes(x = as_date(Month), y = Value, colour = Series)) +
geom_line()
Created on 2020-12-10 by the reprex package (v0.3.0)
如 link 所述,fable 包中的转换数据进行了一些修正,以生成均值而不是中位数。
我认为它来自于此,因为您使用修改残差法则的对数变换。
请注意,如果您使用 point_forecast = list(Fable_Model = median)
,两个模型都会给出相同的结果。
所以我猜寓言是对的