R:按时间序列数据的列组应用多个函数
R: applying multiple functions by group of columns on time series data
我有一个每小时级别的时间序列数据。我正在尝试为该数据建立预测。以下是数据样本:
sample <-
structure(list(group_type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Group 1",
"Group 2", "Group 5"), class = "factor"), sub_group_type = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Sub Group 1", "Sub Group 2", "Sub Group 3"),
class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1/1/17",
"1/2/17", "1/3/17"), class = "factor"), hour = c(6L, 7L, 8L, 9L, 10L, 11L, 12L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), weekday = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("Monday", "Sunday", "Tuesday"), class = "factor"), total = c(9L, 9L,
10L, 6L, 2L, 14L, 3L, 11L, 12L, 12L, 0L, 10L, 8L, 13L, 14L, 17L, 12L, 5L, 9L, 7L,
10L, 13L, 23L, 11L, 3L, 6L, 10L, 11L, 14L, 16L, 13L, 2L, 3L, 4L, 14L, 11L, 16L,
8L, 12L, 7L, 6L, 13L, 13L, 22L, 12L, 7L, 9L, 8L, 14L, 9L, 16L, 15L, 6L, 7L, 6L,
12L, 13L, 14L, 7L, 3L, 13L, 11L, 6L, 8L, 15L, 11L, 3L, 10L, 9L, 7L, 12L, 10L, 10L,
3L, 14L, 8L, 12L, 10L, 20L, 5L, 4L, 8L, 12L, 3L, 0L, 4L, 5L, 1L, 6L, 7L, 0L, 3L,
1L, 1L, 0L, 2L, 2L, 0L, 2L, 0L, 3L, 7L, 6L, 2L, 1L)), .Names = c("group_type",
"sub_group_type", "date", "hour", "weekday", "total"), class = "data.frame",
row.names = c(NA, -105L))
我正在对上述数据应用以下函数:
models <- function(x){
x <- msts(x, seasonal.periods=c(24,168))
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=7, size=25)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=7)
mod_bats <- bats(x, ic='aicc', seasonal.periods=7)
mod_stl <- stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)
}
test <- by(sample,list(sample$group_type,sample$sub_group_type,sample$date, sample$hour
),models)
但是,我收到以下错误:
Error in ets(x, ic = "aicc", restrict = T) : y should be a univariate time series
如果我按如下方式拆分数据并应用 ets() 函数,我可以 运行 它没有任何问题。但是,这种数据拆分对我来说不是一个非常可行的选择,因为组和子组的数量太多,而且每个组都有不同的时间序列模式:
sub_sample_1 <- sample[sample$group_type == "Group 1" & sample$sub_group_type == "Sub Group 1",6]
x <- msts(sub_sample_1, seasonal.periods=24)
mod_arima <- auto.arima(x, ic='aicc', stepwise=F)
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=24, size=10)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=24)
mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
mod_stl <- stlm(x, s.window=24, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)
是否有任何变通方法可以让我按列组应用模型而不会遇到任何错误?
此外,并非所有模型都适用于所有组。对于 sub_sample_1 数据,HoltWinters、neuralnet、bats 和 stl 给我错误,其他人正在工作
> mod_hwa <- HoltWinters(x,seasonal = "additive")
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
time series has no or less than 2 periods
> mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
Error in HoltWinters(x, seasonal = "multiplicative") :
data must be non-zero for multiplicative Holt-Winters
> mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
Error in optim(par = param.vector$vect, fn = calcLikelihoodNOTransformed, :
function cannot be evaluated at initial parameters
我能理解为什么这些模型不适用于我的数据。当我应用该功能时出现错误时如何排除它们?
在此先感谢您的帮助!
这个问题与我的另一个问题相似(可能是扩展)here
您当前的设置出现了几个问题:
如果未指定 return()
, 函数 return 最后一行。因此,您的第一次尝试将丢失除 mod_sts
之外的所有行,这将为 by
.
的每个子集分配值 test
在您的子集代码中,您实际上传递了第 6 列(一个原子向量),而您在第一次代码尝试中传递了数据帧的所有列。这可能是您输入错误的原因 msts 文档:
A numeric vector, ts object, matrix or data frame. It is intended that
the time series data is univariate, otherwise treated the same as
ts().
您的 by
正在接收四个分组,group_type,sub_group_type、date 和 hour 与您的第二个子集代码不同。除非您的数据非常大,否则这么多分组可能会导致行数很少或没有行,因此正如您最后一个代码块所建议的那样,模型过程的数据点不足。
话虽如此,请考虑在 return 通过前两个分组对命名元素列表进行以下调整,指定第 6 列。并且由于 by
采用了多种因素的组合,这些因素在子集数据帧中可能不会产生任何行,因此下面使用 tryCatch
来捕获任何错误,并在最后一行中使用 return 空列表进行过滤。
models <- function(x){
x <- msts(x, seasonal.periods=c(24,168))
list(
mod_exp = ets(x, ic='aicc', restrict=T),
mod_hwa = HoltWinters(x,seasonal = "additive"),
mod_hwm = HoltWinters(x,seasonal = "multiplicative"),
mod_neural = nnetar(x, p=7, size=25),
mod_tbats = tbats(x, ic='aicc', seasonal.periods=7),
mod_bats = bats(x, ic='aicc', seasonal.periods=7),
mod_stl = stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets'),
mod_sts = StructTS(x)
)
}
# TRY/CATCH TO CAPTURE ERRORS AND RETURN EMPTY LIST
test <- by(sample[,6], list(sample$group_type, sample$sub_group_type),
function(x) tryCatch({ models(x)
}, error=function(e) return(list(NA))))
# TO REMOVE NULLs AND NAs (EMPTY ITEMS)
test <- Filter(function(i) length(i) > 0, test)
我有一个每小时级别的时间序列数据。我正在尝试为该数据建立预测。以下是数据样本:
sample <-
structure(list(group_type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Group 1",
"Group 2", "Group 5"), class = "factor"), sub_group_type = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Sub Group 1", "Sub Group 2", "Sub Group 3"),
class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1/1/17",
"1/2/17", "1/3/17"), class = "factor"), hour = c(6L, 7L, 8L, 9L, 10L, 11L, 12L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), weekday = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("Monday", "Sunday", "Tuesday"), class = "factor"), total = c(9L, 9L,
10L, 6L, 2L, 14L, 3L, 11L, 12L, 12L, 0L, 10L, 8L, 13L, 14L, 17L, 12L, 5L, 9L, 7L,
10L, 13L, 23L, 11L, 3L, 6L, 10L, 11L, 14L, 16L, 13L, 2L, 3L, 4L, 14L, 11L, 16L,
8L, 12L, 7L, 6L, 13L, 13L, 22L, 12L, 7L, 9L, 8L, 14L, 9L, 16L, 15L, 6L, 7L, 6L,
12L, 13L, 14L, 7L, 3L, 13L, 11L, 6L, 8L, 15L, 11L, 3L, 10L, 9L, 7L, 12L, 10L, 10L,
3L, 14L, 8L, 12L, 10L, 20L, 5L, 4L, 8L, 12L, 3L, 0L, 4L, 5L, 1L, 6L, 7L, 0L, 3L,
1L, 1L, 0L, 2L, 2L, 0L, 2L, 0L, 3L, 7L, 6L, 2L, 1L)), .Names = c("group_type",
"sub_group_type", "date", "hour", "weekday", "total"), class = "data.frame",
row.names = c(NA, -105L))
我正在对上述数据应用以下函数:
models <- function(x){
x <- msts(x, seasonal.periods=c(24,168))
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=7, size=25)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=7)
mod_bats <- bats(x, ic='aicc', seasonal.periods=7)
mod_stl <- stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)
}
test <- by(sample,list(sample$group_type,sample$sub_group_type,sample$date, sample$hour
),models)
但是,我收到以下错误:
Error in ets(x, ic = "aicc", restrict = T) : y should be a univariate time series
如果我按如下方式拆分数据并应用 ets() 函数,我可以 运行 它没有任何问题。但是,这种数据拆分对我来说不是一个非常可行的选择,因为组和子组的数量太多,而且每个组都有不同的时间序列模式:
sub_sample_1 <- sample[sample$group_type == "Group 1" & sample$sub_group_type == "Sub Group 1",6]
x <- msts(sub_sample_1, seasonal.periods=24)
mod_arima <- auto.arima(x, ic='aicc', stepwise=F)
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=24, size=10)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=24)
mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
mod_stl <- stlm(x, s.window=24, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)
是否有任何变通方法可以让我按列组应用模型而不会遇到任何错误?
此外,并非所有模型都适用于所有组。对于 sub_sample_1 数据,HoltWinters、neuralnet、bats 和 stl 给我错误,其他人正在工作
> mod_hwa <- HoltWinters(x,seasonal = "additive")
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
time series has no or less than 2 periods
> mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
Error in HoltWinters(x, seasonal = "multiplicative") :
data must be non-zero for multiplicative Holt-Winters
> mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
Error in optim(par = param.vector$vect, fn = calcLikelihoodNOTransformed, :
function cannot be evaluated at initial parameters
我能理解为什么这些模型不适用于我的数据。当我应用该功能时出现错误时如何排除它们?
在此先感谢您的帮助!
这个问题与我的另一个问题相似(可能是扩展)here
您当前的设置出现了几个问题:
-
如果未指定
函数 return 最后一行。因此,您的第一次尝试将丢失除
mod_sts
之外的所有行,这将为by
. 的每个子集分配值 在您的子集代码中,您实际上传递了第 6 列(一个原子向量),而您在第一次代码尝试中传递了数据帧的所有列。这可能是您输入错误的原因 msts 文档:
A numeric vector, ts object, matrix or data frame. It is intended that the time series data is univariate, otherwise treated the same as ts().
您的
by
正在接收四个分组,group_type,sub_group_type、date 和 hour 与您的第二个子集代码不同。除非您的数据非常大,否则这么多分组可能会导致行数很少或没有行,因此正如您最后一个代码块所建议的那样,模型过程的数据点不足。
return()
,test
话虽如此,请考虑在 return 通过前两个分组对命名元素列表进行以下调整,指定第 6 列。并且由于 by
采用了多种因素的组合,这些因素在子集数据帧中可能不会产生任何行,因此下面使用 tryCatch
来捕获任何错误,并在最后一行中使用 return 空列表进行过滤。
models <- function(x){
x <- msts(x, seasonal.periods=c(24,168))
list(
mod_exp = ets(x, ic='aicc', restrict=T),
mod_hwa = HoltWinters(x,seasonal = "additive"),
mod_hwm = HoltWinters(x,seasonal = "multiplicative"),
mod_neural = nnetar(x, p=7, size=25),
mod_tbats = tbats(x, ic='aicc', seasonal.periods=7),
mod_bats = bats(x, ic='aicc', seasonal.periods=7),
mod_stl = stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets'),
mod_sts = StructTS(x)
)
}
# TRY/CATCH TO CAPTURE ERRORS AND RETURN EMPTY LIST
test <- by(sample[,6], list(sample$group_type, sample$sub_group_type),
function(x) tryCatch({ models(x)
}, error=function(e) return(list(NA))))
# TO REMOVE NULLs AND NAs (EMPTY ITEMS)
test <- Filter(function(i) length(i) > 0, test)