使用 ggplot2 和 funggcast 函数进行预测
Forecast with ggplot2 and funggcast function
在 this website, Mr. Davenport published a function to plot an arima forecast with ggplot2
on the example of an arbitrary dataset, he published here。我可以按照他的例子做,不会出现任何错误信息。
现在,当我使用我的数据时,我会以警告结束:
1: In window.default(x, ...) : 'end' value not changed
2: In window.default(x, ...) : 'end' value not changed
我知道当我调用此命令 pd <- funggcast(yt, yfor)
时会发生这种情况,因为我在数据 end = c(2013)
中指示的数据存在问题。但我不知道如何解决。
这是我使用的代码:
library(ggplot2)
library(zoo)
library(forecast)
myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){
en <- max(time(fcast$mean)) # Extract the max date used in the forecast
# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))
# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'
ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data
# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')
pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
return(pd)
}
yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(myts) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(yt, yfor) # extract the data for ggplot using function funggcast()
ggplot(data = pd, aes(x = date,y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast"))
编辑: 我找到了另一个博客 here,其中包含与 forecast
和 ggplot2
一起使用的功能,但我想使用上面的方法,如果我能找到我的错误。有人吗?
编辑2:
如果我 运行 你的更新代码和我的数据 here,那么我会得到下面的图表。请注意,我没有将 end = c(2023)
更改为 mtys
,否则它不会将预测值与拟合值合并。
myts <- ts(WDI_gdp_capita$Brazil, start = c(1960), end = c(2023), freq = 1)
funggcast <- function(dn, fcast){
en <- max(time(fcast$mean)) # Extract the max date used in the forecast
# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))
# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'
ds <- merge(ds, dfit, all = T) # Merge fitted values with source and training data
# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')
pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
return(pd)
} # ggplot function by Frank Davenport
yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()
ggplot(data = pd, aes(x = date, y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast")) + ggsave((filename = "gdp_forecast_ggplot.pdf"), width=330, height=180, units=c("mm"), dpi = 300, limitsize = TRUE)
我得到的近乎完美的图表:
另外一个问题:我怎样才能得到这张图中的图例?
如果我为 myts
设置 end = c(2013)
,我会得到与开始时相同的图形:
达文波特先生的分析与您试图制作的情节有几点不同。
第一个是他将 arima 预测与一些观察到的数据进行比较,这就是为什么他在整个时间序列的一部分(训练集)上训练模型。
为此,您应该延长初始时间序列:
myts <- ts(rnorm(55), start = c(1960), end = c(2023), freq = 1)
然后在脚本的末尾,select 到 2013 年的培训:
yt <- window(myts, end = c(2013)) # extract training data until last year
模型应该在训练集上进行训练,而不是整个时间序列,所以你应该将 yfit 行更改为:
yfit <- auto.arima(yt) # fit arima model
并使用整个时间序列调用 funggcast 函数,因为它需要观察和拟合数据:
pd <- funggcast(myts, yfor)
最后,他使用了包含月份和年份的日期,因此在他的 funggcast
函数中,更改此行:
dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))
收件人:
dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
这是因为模型预测的值需要更改为日期,如2014年必须更改为2014-01-01,以便与观测数据合并。
所有更改后,代码如下所示:
library(ggplot2)
library(zoo)
library(forecast)
myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){
en <- max(time(fcast$mean)) # Extract the max date used in the forecast
# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))
# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'
ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data
# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')
pd <- merge(ds, dfcastn,all= T) # final data.frame for use in ggplot
return(pd)
}
yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()
plotData<-ggplot(data = pd, aes(x = date, y = observed)) + geom_line(aes(color = "1")) +
geom_line(aes(y = fitted,color="2")) +
geom_line(aes(y = forecast,color="3")) +
scale_colour_manual(values=c("red", "blue","black"),labels = c("Observed", "Fitted", "Forecasted"),name="Data")+
geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25)+
scale_x_date(name = "Time in Decades") +
scale_y_continuous(name = "GDP per capita (current US$)")+
theme(axis.text.x = element_text(size = 10)) +
ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)")
plotData
你会得到一个看起来像这样的图,对于完全 运行dom 时间序列来说拟合非常糟糕。 ggplot 也会输出一些错误,因为预测线在 2013 年之前没有数据并且拟合数据在 2013 年之后没有继续。(我 运行 它几次,取决于初始, 运行dom 时间序列, 该模型可能到处都预测为 0)
编辑:也更改了 pd
分配行,以防 2013 年之后没有观察到的数据
Edit2:我更改了代码末尾的 ggplot 函数以确保图例显示
有一个名为 ggfortify 的包可通过 github 获得,它允许使用 ggplot2 直接绘制预测对象。可以在 http://rpubs.com/sinhrks/plot_ts
上找到
这是一个相当老的 post 的凸起,但是 fuction in github 产生了一些不错的结果。
这是 2016 年 8 月 3 日的代码:
function(forec.obj, data.color = 'blue', fit.color = 'red', forec.color = 'black',
lower.fill = 'darkgrey', upper.fill = 'grey', format.date = F)
{
serie.orig = forec.obj$x
serie.fit = forec.obj$fitted
pi.strings = paste(forec.obj$level, '%', sep = '')
if(format.date)
dates = as.Date(time(serie.orig))
else
dates = time(serie.orig)
serie.df = data.frame(date = dates, serie.orig = serie.orig, serie.fit = serie.fit)
forec.M = cbind(forec.obj$mean, forec.obj$lower[, 1:2], forec.obj$upper[, 1:2])
forec.df = as.data.frame(forec.M)
colnames(forec.df) = c('forec.val', 'l0', 'l1', 'u0', 'u1')
if(format.date)
forec.df$date = as.Date(time(forec.obj$mean))
else
forec.df$date = time(forec.obj$mean)
p = ggplot() +
geom_line(aes(date, serie.orig, colour = 'data'), data = serie.df) +
geom_line(aes(date, serie.fit, colour = 'fit'), data = serie.df) +
scale_y_continuous() +
geom_ribbon(aes(x = date, ymin = l0, ymax = u0, fill = 'lower'), data = forec.df, alpha = I(0.4)) +
geom_ribbon(aes(x = date, ymin = l1, ymax = u1, fill = 'upper'), data = forec.df, alpha = I(0.3)) +
geom_line(aes(date, forec.val, colour = 'forecast'), data = forec.df) +
scale_color_manual('Series', values=c('data' = data.color, 'fit' = fit.color, 'forecast' = forec.color)) +
scale_fill_manual('P.I.', values=c('lower' = lower.fill, 'upper' = upper.fill))
if (format.date)
p = p + scale_x_date()
p
}
在 this website, Mr. Davenport published a function to plot an arima forecast with ggplot2
on the example of an arbitrary dataset, he published here。我可以按照他的例子做,不会出现任何错误信息。
现在,当我使用我的数据时,我会以警告结束:
1: In window.default(x, ...) : 'end' value not changed
2: In window.default(x, ...) : 'end' value not changed
我知道当我调用此命令 pd <- funggcast(yt, yfor)
时会发生这种情况,因为我在数据 end = c(2013)
中指示的数据存在问题。但我不知道如何解决。
这是我使用的代码:
library(ggplot2)
library(zoo)
library(forecast)
myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){
en <- max(time(fcast$mean)) # Extract the max date used in the forecast
# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))
# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'
ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data
# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')
pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
return(pd)
}
yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(myts) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(yt, yfor) # extract the data for ggplot using function funggcast()
ggplot(data = pd, aes(x = date,y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast"))
编辑: 我找到了另一个博客 here,其中包含与 forecast
和 ggplot2
一起使用的功能,但我想使用上面的方法,如果我能找到我的错误。有人吗?
编辑2:
如果我 运行 你的更新代码和我的数据 here,那么我会得到下面的图表。请注意,我没有将 end = c(2023)
更改为 mtys
,否则它不会将预测值与拟合值合并。
myts <- ts(WDI_gdp_capita$Brazil, start = c(1960), end = c(2023), freq = 1)
funggcast <- function(dn, fcast){
en <- max(time(fcast$mean)) # Extract the max date used in the forecast
# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))
# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'
ds <- merge(ds, dfit, all = T) # Merge fitted values with source and training data
# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')
pd <- merge(ds, dfcastn,all.x = T) # final data.frame for use in ggplot
return(pd)
} # ggplot function by Frank Davenport
yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()
ggplot(data = pd, aes(x = date, y = observed)) + geom_line(color = "red") + geom_line(aes(y = fitted), color = "blue") + geom_line(aes(y = forecast)) + geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25) + scale_x_date(name = "Time in Decades") + scale_y_continuous(name = "GDP per capita (current US$)") + theme(axis.text.x = element_text(size = 10), legend.justification=c(0,1), legend.position=c(0,1)) + ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)") + scale_color_manual(values = c("Blue", "Red"), breaks = c("Fitted", "Data", "Forecast")) + ggsave((filename = "gdp_forecast_ggplot.pdf"), width=330, height=180, units=c("mm"), dpi = 300, limitsize = TRUE)
我得到的近乎完美的图表:
另外一个问题:我怎样才能得到这张图中的图例?
如果我为 myts
设置 end = c(2013)
,我会得到与开始时相同的图形:
达文波特先生的分析与您试图制作的情节有几点不同。 第一个是他将 arima 预测与一些观察到的数据进行比较,这就是为什么他在整个时间序列的一部分(训练集)上训练模型。 为此,您应该延长初始时间序列:
myts <- ts(rnorm(55), start = c(1960), end = c(2023), freq = 1)
然后在脚本的末尾,select 到 2013 年的培训:
yt <- window(myts, end = c(2013)) # extract training data until last year
模型应该在训练集上进行训练,而不是整个时间序列,所以你应该将 yfit 行更改为:
yfit <- auto.arima(yt) # fit arima model
并使用整个时间序列调用 funggcast 函数,因为它需要观察和拟合数据:
pd <- funggcast(myts, yfor)
最后,他使用了包含月份和年份的日期,因此在他的 funggcast
函数中,更改此行:
dfcastn$date <- as.Date(as.yearmon(row.names(dfcastn)))
收件人:
dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
这是因为模型预测的值需要更改为日期,如2014年必须更改为2014-01-01,以便与观测数据合并。
所有更改后,代码如下所示:
library(ggplot2)
library(zoo)
library(forecast)
myts <- ts(rnorm(55), start = c(1960), end = c(2013), freq = 1)
funggcast <- function(dn, fcast){
en <- max(time(fcast$mean)) # Extract the max date used in the forecast
# Extract Source and Training Data
ds <- as.data.frame(window(dn, end = en))
names(ds) <- 'observed'
ds$date <- as.Date(time(window(dn, end = en)))
# Extract the Fitted Values (need to figure out how to grab confidence intervals)
dfit <- as.data.frame(fcast$fitted)
dfit$date <- as.Date(time(fcast$fitted))
names(dfit)[1] <- 'fitted'
ds <- merge(ds, dfit, all.x = T) # Merge fitted values with source and training data
# Extract the Forecast values and confidence intervals
dfcastn <- as.data.frame(fcast)
dfcastn$date <- as.Date(paste(row.names(dfcastn),"01","01",sep="-"))
names(dfcastn) <- c('forecast','lo80','hi80','lo95','hi95','date')
pd <- merge(ds, dfcastn,all= T) # final data.frame for use in ggplot
return(pd)
}
yt <- window(myts, end = c(2013)) # extract training data until last year
yfit <- auto.arima(yt) # fit arima model
yfor <- forecast(yfit) # forecast
pd <- funggcast(myts, yfor) # extract the data for ggplot using function funggcast()
plotData<-ggplot(data = pd, aes(x = date, y = observed)) + geom_line(aes(color = "1")) +
geom_line(aes(y = fitted,color="2")) +
geom_line(aes(y = forecast,color="3")) +
scale_colour_manual(values=c("red", "blue","black"),labels = c("Observed", "Fitted", "Forecasted"),name="Data")+
geom_ribbon(aes(ymin = lo95, ymax = hi95), alpha = .25)+
scale_x_date(name = "Time in Decades") +
scale_y_continuous(name = "GDP per capita (current US$)")+
theme(axis.text.x = element_text(size = 10)) +
ggtitle("Arima(0,1,1) Fit and Forecast of GDP per capita for Brazil (1960-2013)")
plotData
你会得到一个看起来像这样的图,对于完全 运行dom 时间序列来说拟合非常糟糕。 ggplot 也会输出一些错误,因为预测线在 2013 年之前没有数据并且拟合数据在 2013 年之后没有继续。(我 运行 它几次,取决于初始, 运行dom 时间序列, 该模型可能到处都预测为 0)
编辑:也更改了 pd
分配行,以防 2013 年之后没有观察到的数据
Edit2:我更改了代码末尾的 ggplot 函数以确保图例显示
有一个名为 ggfortify 的包可通过 github 获得,它允许使用 ggplot2 直接绘制预测对象。可以在 http://rpubs.com/sinhrks/plot_ts
上找到这是一个相当老的 post 的凸起,但是 fuction in github 产生了一些不错的结果。
这是 2016 年 8 月 3 日的代码:
function(forec.obj, data.color = 'blue', fit.color = 'red', forec.color = 'black',
lower.fill = 'darkgrey', upper.fill = 'grey', format.date = F)
{
serie.orig = forec.obj$x
serie.fit = forec.obj$fitted
pi.strings = paste(forec.obj$level, '%', sep = '')
if(format.date)
dates = as.Date(time(serie.orig))
else
dates = time(serie.orig)
serie.df = data.frame(date = dates, serie.orig = serie.orig, serie.fit = serie.fit)
forec.M = cbind(forec.obj$mean, forec.obj$lower[, 1:2], forec.obj$upper[, 1:2])
forec.df = as.data.frame(forec.M)
colnames(forec.df) = c('forec.val', 'l0', 'l1', 'u0', 'u1')
if(format.date)
forec.df$date = as.Date(time(forec.obj$mean))
else
forec.df$date = time(forec.obj$mean)
p = ggplot() +
geom_line(aes(date, serie.orig, colour = 'data'), data = serie.df) +
geom_line(aes(date, serie.fit, colour = 'fit'), data = serie.df) +
scale_y_continuous() +
geom_ribbon(aes(x = date, ymin = l0, ymax = u0, fill = 'lower'), data = forec.df, alpha = I(0.4)) +
geom_ribbon(aes(x = date, ymin = l1, ymax = u1, fill = 'upper'), data = forec.df, alpha = I(0.3)) +
geom_line(aes(date, forec.val, colour = 'forecast'), data = forec.df) +
scale_color_manual('Series', values=c('data' = data.color, 'fit' = fit.color, 'forecast' = forec.color)) +
scale_fill_manual('P.I.', values=c('lower' = lower.fill, 'upper' = upper.fill))
if (format.date)
p = p + scale_x_date()
p
}