递归函数备忘录
Memoise on recursive function
简介
我有一个函数将日期作为输入,用一定的时间做一些计算 - 用 Sys.sleep()
表示 - 删除日期中的所有 '-'
并返回一个字符:
library(maggritr)
auxialiaryCompute = function(vDate)
{
Sys.sleep(1)
vDate %>% as.character %>% gsub("-", "", .)
}
> auxialiaryCompute(as.Date("2015-01-14"))
[1] "20150114"
酷。上面的输出是'20150114'
。现在我想在此函数中包含之前的输出。或前两天,或 .. n
之前的输出,直到过去有限的一天称为 loopBackMaxDate
。
粗略递归
这是一种可能的递归代码:
compute = function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
d = as.Date # short alias
dates = Filter(function(x) x>d(loopBackMaxDate),
getPreviousDates(loopBackDays, d(vDate)))
if(length(dates)==0)
return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))
previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))
auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
}
auxialiaryCompute = function(vDate, previousOutputs=list())
{
Sys.sleep(1)
vDate %>% as.character %>% gsub("-", "", .)
}
getPreviousDates = function(loopBackDays, vDate)
{
if(loopBackDays==0) return()
seq.Date(from=vDate-loopBackDays, to=vDate-1, by="days")
}
有了这个,我得到了和以前一样的结果(平均用时 1 秒):
> compute(as.Date("2015-01-14"))
[1] "20150114"
并且以下内容有效 4
秒:
> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.00 0.00 3.99
我要计算以下内容,需要 3 秒:
> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.02 0.00 3.01
这非常糟糕,因为我正在再次计算 vDate="2014-05-04"
、vDate="2014-05-03"
和 vDate="2014-05-02"
的结果,而调用 compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1)
时已经完成了...
记忆递归
以下是我使用 memoized 的过程:
library(memoise)
compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
d = as.Date # short alias
dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate)))
if(length(dates)==0)
return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))
previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))
auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
})
auxialiaryCompute = memoise(function(vDate, previousOutputs=list())
{
Sys.sleep(1)
vDate %>% as.character %>% gsub("-", "", .)
})
第一个 运行(实际需要 4 秒):
> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.00 0.00 4.01
第二个 运行 需要 1 秒,而我预计需要 0 秒:
> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.00 0.00 0.99
我想我在某个地方完全错了......我可以将输出存储在全局变量中,但我真的想让它与记忆或连续样式传递一起工作并避免冗余计算!
如果有人有想法,我将不胜感激!
好的,首先,我在 auxiliaryCompute
函数上添加了一些登录信息:
compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
d = as.Date # short alias
dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate)))
if(length(dates)==0)
{
loginfo("I reached the tail!")
return(auxiliaryCompute(vDate=vDate, previousOutputs=0))
}
previousOutputs = lapply(dates, function(u){
compute(vDate=u, loopBackMaxDate=loopBackMaxDate, loopBackDays)
})
auxiliaryCompute(vDate2=vDate, previousOutputs=previousOutputs)
})
auxiliaryCompute = memoise(function(vDate2, previousOutputs)
{
loginfo("-------arguments in auxiliaryCompute are: vDate %s , previousOutputs %s", vDate2, unlist(previousOutputs))
# Sys.sleep(1)
vDate2 %>% as.character %>% gsub("-", "", .)
})
> compute("2015-01-10", "2015-01-01", 2)
2015-01-20 18:53:12 INFO::I reached the tail!
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-02 , previousOutputs 0
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-03 , previousOutputs 20150102
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-04 , previousOutputs 20150102,20150103
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-05 , previousOutputs 20150103,20150104
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-06 , previousOutputs 20150104,20150105
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-07 , previousOutputs 20150105,20150106
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-09 , previousOutputs 20150107,20150108
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-10 , previousOutputs 20150108,20150109
[1] "20150110"
> compute("2015-01-08", "2015-01-01", 2)
2015-01-20 18:54:11 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
[1] "20150108"
第一个日志很好,我们每个日期每次只去一次(不重复memoize)。然而,奇怪的是,在第二个日志中,函数 auxiliaryCompute
是用参数 vDate 2015-01-08 , previousOutputs 20150106,20150107
调用的,因为它已经被执行(出现在第一个日志中)。
其他日期被正确记住....只有第一个错误...这是因为它是一个字符串并且递归中的其他日期被强制为日期格式。
只需在参数中输入日期即可:
> compute(as.Date("2015-01-08"), "2015-01-01", 2)
[1] "20150108"
这真的很狡猾,因为 R 不是强类型语言,主要是因为我用 "confusing" dates 和 strings[= 编码得非常糟糕29=]!
简介
我有一个函数将日期作为输入,用一定的时间做一些计算 - 用 Sys.sleep()
表示 - 删除日期中的所有 '-'
并返回一个字符:
library(maggritr)
auxialiaryCompute = function(vDate)
{
Sys.sleep(1)
vDate %>% as.character %>% gsub("-", "", .)
}
> auxialiaryCompute(as.Date("2015-01-14"))
[1] "20150114"
酷。上面的输出是'20150114'
。现在我想在此函数中包含之前的输出。或前两天,或 .. n
之前的输出,直到过去有限的一天称为 loopBackMaxDate
。
粗略递归
这是一种可能的递归代码:
compute = function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
d = as.Date # short alias
dates = Filter(function(x) x>d(loopBackMaxDate),
getPreviousDates(loopBackDays, d(vDate)))
if(length(dates)==0)
return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))
previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))
auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
}
auxialiaryCompute = function(vDate, previousOutputs=list())
{
Sys.sleep(1)
vDate %>% as.character %>% gsub("-", "", .)
}
getPreviousDates = function(loopBackDays, vDate)
{
if(loopBackDays==0) return()
seq.Date(from=vDate-loopBackDays, to=vDate-1, by="days")
}
有了这个,我得到了和以前一样的结果(平均用时 1 秒):
> compute(as.Date("2015-01-14"))
[1] "20150114"
并且以下内容有效 4
秒:
> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.00 0.00 3.99
我要计算以下内容,需要 3 秒:
> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.02 0.00 3.01
这非常糟糕,因为我正在再次计算 vDate="2014-05-04"
、vDate="2014-05-03"
和 vDate="2014-05-02"
的结果,而调用 compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1)
时已经完成了...
记忆递归
以下是我使用 memoized 的过程:
library(memoise)
compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
d = as.Date # short alias
dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate)))
if(length(dates)==0)
return(auxialiaryCompute(vDate=vDate, previousOutputs=list()))
previousOutputs = lapply(dates, function(u) compute(u, loopBackMaxDate, loopBackDays))
auxialiaryCompute(vDate=vDate, previousOutputs=previousOutputs)
})
auxialiaryCompute = memoise(function(vDate, previousOutputs=list())
{
Sys.sleep(1)
vDate %>% as.character %>% gsub("-", "", .)
})
第一个 运行(实际需要 4 秒):
> system.time(compute("2014-05-05", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.00 0.00 4.01
第二个 运行 需要 1 秒,而我预计需要 0 秒:
> system.time(compute("2014-05-04", loopBackMaxDate="2014-05-01", loopBackDays=1))
user system elapsed
0.00 0.00 0.99
我想我在某个地方完全错了......我可以将输出存储在全局变量中,但我真的想让它与记忆或连续样式传递一起工作并避免冗余计算!
如果有人有想法,我将不胜感激!
好的,首先,我在 auxiliaryCompute
函数上添加了一些登录信息:
compute = memoise(function(vDate, loopBackMaxDate=vDate, loopBackDays=0)
{
d = as.Date # short alias
dates = Filter(function(x) x>d(loopBackMaxDate), getPreviousDates(loopBackDays, d(vDate)))
if(length(dates)==0)
{
loginfo("I reached the tail!")
return(auxiliaryCompute(vDate=vDate, previousOutputs=0))
}
previousOutputs = lapply(dates, function(u){
compute(vDate=u, loopBackMaxDate=loopBackMaxDate, loopBackDays)
})
auxiliaryCompute(vDate2=vDate, previousOutputs=previousOutputs)
})
auxiliaryCompute = memoise(function(vDate2, previousOutputs)
{
loginfo("-------arguments in auxiliaryCompute are: vDate %s , previousOutputs %s", vDate2, unlist(previousOutputs))
# Sys.sleep(1)
vDate2 %>% as.character %>% gsub("-", "", .)
})
> compute("2015-01-10", "2015-01-01", 2)
2015-01-20 18:53:12 INFO::I reached the tail!
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-02 , previousOutputs 0
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-03 , previousOutputs 20150102
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-04 , previousOutputs 20150102,20150103
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-05 , previousOutputs 20150103,20150104
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-06 , previousOutputs 20150104,20150105
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-07 , previousOutputs 20150105,20150106
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-09 , previousOutputs 20150107,20150108
2015-01-20 18:53:12 INFO::-------arguments: vDate 2015-01-10 , previousOutputs 20150108,20150109
[1] "20150110"
> compute("2015-01-08", "2015-01-01", 2)
2015-01-20 18:54:11 INFO::-------arguments: vDate 2015-01-08 , previousOutputs 20150106,20150107
[1] "20150108"
第一个日志很好,我们每个日期每次只去一次(不重复memoize)。然而,奇怪的是,在第二个日志中,函数 auxiliaryCompute
是用参数 vDate 2015-01-08 , previousOutputs 20150106,20150107
调用的,因为它已经被执行(出现在第一个日志中)。
其他日期被正确记住....只有第一个错误...这是因为它是一个字符串并且递归中的其他日期被强制为日期格式。
只需在参数中输入日期即可:
> compute(as.Date("2015-01-08"), "2015-01-01", 2)
[1] "20150108"
这真的很狡猾,因为 R 不是强类型语言,主要是因为我用 "confusing" dates 和 strings[= 编码得非常糟糕29=]!