使用 tidyquant R 遍历参数列表
Iterating through a list of parameters using tidyquant R
我有一个数据集,我想使用 tq_mutate 处理它并使用不同的参数值滚动应用。
目前我正在使用 for 循环遍历所有参数值,但我确信这不是完成此任务的最有效或最快的方法(尤其是当我要查看大量数据时参数值)。如何改进或删除 for 循环?我怀疑这意味着使用 purrr::map 或其他一些方式(multithreading/multicore 等),但我无法在网上找到有用的示例。
下面是一些示例代码。请忽略数据集和比例函数输出的简单性,它仅用于说明目的。我想要做的是遍历许多不同的 V0 值。
library(dplyr)
library(tidyverse)
library(broom)
library(tidyquant)
my_bogus_function <- function(df, V0=1925) {
# WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
# FOR THE PURPOSES OF THE QUESTION
c(V0, V0*2)
}
window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>%
dplyr::select("date", "open")
# CAN THIS LOOP BE DONE IN A MORE EFFICIENT MANNER?
for (i in (1825:1830)){
df <- df %>%
tq_mutate(mutate_fun = rollapply,
width = window_size,
by.column = FALSE,
FUN = my_bogus_function,
col_rename = gsub("$", sprintf(".%d", i), cnames),
V0 = i
)
}
# END OF THE FOR LOOP I WANT FASTER
鉴于 R 使用一个内核,我发现通过使用 parallel、doSNOW 和 foreach 包可以改进,它允许使用多个内核(请注意,我在 windows 机器上,所以其他一些包是无法使用)。
我敢肯定 multithread/parallelise/vectorise 代码还有其他答案。
这里是任何感兴趣的人的代码。
library(dplyr)
library(tidyverse)
library(tidyquant)
library(parallel)
library(doSNOW)
library(foreach)
window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>%
dplyr::select("date", "open")
my_bogus_function <- function(df, V0=1925) {
# WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
# FOR THE PURPOSES OF THE QUESTION
c(V0, V0*2)
}
# CAN THIS LOOP BE DONE IN A MORE EFFICIENT/FASTER MANNER? YES
numCores <- detectCores() # get the number of cores available
cl <- makeCluster(numCores, type = "SOCK")
registerDoSNOW(cl)
# Function to combine the outputs
mycombinefunc <- function(a,b){merge(a, b, by = c("date","open"))}
# Run the loop over multiple cores
meh <- foreach(i = 1825:1830, .combine = "mycombinefunc") %dopar% {
message(i)
df %>%
# Adjust everything
tq_mutate(mutate_fun = rollapply,
width = window_size,
by.column = FALSE,
FUN = my_bogus_function,
col_rename = gsub("$", sprintf(".%d", i), cnames),
V0 = i
)
}
stopCluster(cl)
# END OF THE FOR LOOP I WANTED FASTER
我有一个数据集,我想使用 tq_mutate 处理它并使用不同的参数值滚动应用。
目前我正在使用 for 循环遍历所有参数值,但我确信这不是完成此任务的最有效或最快的方法(尤其是当我要查看大量数据时参数值)。如何改进或删除 for 循环?我怀疑这意味着使用 purrr::map 或其他一些方式(multithreading/multicore 等),但我无法在网上找到有用的示例。
下面是一些示例代码。请忽略数据集和比例函数输出的简单性,它仅用于说明目的。我想要做的是遍历许多不同的 V0 值。
library(dplyr)
library(tidyverse)
library(broom)
library(tidyquant)
my_bogus_function <- function(df, V0=1925) {
# WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
# FOR THE PURPOSES OF THE QUESTION
c(V0, V0*2)
}
window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>%
dplyr::select("date", "open")
# CAN THIS LOOP BE DONE IN A MORE EFFICIENT MANNER?
for (i in (1825:1830)){
df <- df %>%
tq_mutate(mutate_fun = rollapply,
width = window_size,
by.column = FALSE,
FUN = my_bogus_function,
col_rename = gsub("$", sprintf(".%d", i), cnames),
V0 = i
)
}
# END OF THE FOR LOOP I WANT FASTER
鉴于 R 使用一个内核,我发现通过使用 parallel、doSNOW 和 foreach 包可以改进,它允许使用多个内核(请注意,我在 windows 机器上,所以其他一些包是无法使用)。
我敢肯定 multithread/parallelise/vectorise 代码还有其他答案。
这里是任何感兴趣的人的代码。
library(dplyr)
library(tidyverse)
library(tidyquant)
library(parallel)
library(doSNOW)
library(foreach)
window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>%
dplyr::select("date", "open")
my_bogus_function <- function(df, V0=1925) {
# WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
# FOR THE PURPOSES OF THE QUESTION
c(V0, V0*2)
}
# CAN THIS LOOP BE DONE IN A MORE EFFICIENT/FASTER MANNER? YES
numCores <- detectCores() # get the number of cores available
cl <- makeCluster(numCores, type = "SOCK")
registerDoSNOW(cl)
# Function to combine the outputs
mycombinefunc <- function(a,b){merge(a, b, by = c("date","open"))}
# Run the loop over multiple cores
meh <- foreach(i = 1825:1830, .combine = "mycombinefunc") %dopar% {
message(i)
df %>%
# Adjust everything
tq_mutate(mutate_fun = rollapply,
width = window_size,
by.column = FALSE,
FUN = my_bogus_function,
col_rename = gsub("$", sprintf(".%d", i), cnames),
V0 = i
)
}
stopCluster(cl)
# END OF THE FOR LOOP I WANTED FASTER