如何创建列表列表,然后对其执行矢量化函数
How to create a list of list and then perform a vectorised function over it
我正在寻找此请求中的两个具体帮助点
1) 如何根据下面的数据库 (all.df) 创建列表列表
2) 如何在这个 list
列表上向量化一个函数
我正在尝试使用 Prophet 库生成客户/产品级别的预测。
我正在努力将操作矢量化。
我目前 运行 一个 for 循环,我想避免它并加快我的计算速度。
分析数据
set.seed(1123)
df1 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "a",
Product = "xxx",
Revenue = sample(1:100, 365*2, replace=TRUE))
df2 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "a",
Product = "yyy",
Revenue = sample(25:200, 365*2, replace=TRUE))
df3 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "b",
Product = "xxx",
Revenue = sample(1:100, 365*2, replace=TRUE))
df4 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "b",
Product = "yyy",
Revenue = sample(25:200, 365*2, replace=TRUE) )
all.df <- rbind(df1, df2, df3, df4)
这是我的预测函数
daily_forecast <- function(df, forecast.days = 365){
# fit actuals into prophet
m <- prophet(df,
yearly.seasonality = TRUE,
weekly.seasonality = TRUE,
changepoint.prior.scale = 0.55) # default value is 0.05
# create dummy data frame to hold prodictions
future <- make_future_dataframe(m, periods = forecast.days, freq = "day")
# run the prediction
forecast <- predict(m, future)
### Select the date and forecast from the model and then merge with actuals
daily_fcast <- forecast %>% select(ds, yhat) %>% dplyr::rename(Date = ds, fcast.daily = yhat)
actual.to.merge <- df %>% dplyr::rename(Date = ds, Actual.Revenue = y)
daily_fcast <- merge(actual.to.merge, daily_fcast, all = TRUE)
}
目前,我使用 for 循环
一次完成一个 customer/product
x <- df1 %>% select(-c(Customer, Product)) %>%
dplyr::rename(ds = Date, y = Revenue) %>%
daily_forecast()
我想改为矢量化整个操作:
1-创建列表列表,即将 all.df 拆分为:
a) 产品然后
b) 按客户
2-然后让 daily_forecast 函数映射到上面 1) 中创建的列表列表
我非常想使用 purrr
中的函数。
以下是我如何使用 purrr
来完成您的要求:
library(tidyverse)
library(lubridate)
library(prophet)
res <-
all.df %>%
split(.$Customer) %>%
map(~ split(.x, .x$Product)) %>%
at_depth(2, select, ds = Date, y = Revenue) %>%
at_depth(2, daily_forecast)
str(res)
# List of 2
# $ a:List of 2
# ..$ xxx:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 76 87 87 56 83 17 19 72 92 35 ...
# .. ..$ fcast.daily : num [1:1095] 55.9 57.9 51.9 51.9 54 ...
# ..$ yyy:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 62 87 175 186 168 190 30 192 119 170 ...
# .. ..$ fcast.daily : num [1:1095] 121 121 119 119 116 ...
# $ b:List of 2
# ..$ xxx:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 71 94 81 32 85 59 59 55 50 50 ...
# .. ..$ fcast.daily : num [1:1095] 51.9 54.2 54.5 53.1 51.9 ...
# ..$ yyy:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 105 46 153 136 59 59 34 72 70 85 ...
# .. ..$ fcast.daily : num [1:1095] 103.3 103.3 103.1 103.1 91.5 ...
但以下对我来说更自然(将所有内容保存在数据框中):
res_2 <-
all.df %>%
rename(ds = Date, y = Revenue) %>%
nest(ds, y) %>%
transmute(Customer, Product, res = map(data, daily_forecast)) %>%
unnest()
# # A tibble: 4,380 × 5
# Customer Product Date Actual.Revenue fcast.daily
# <fctr> <fctr> <date> <int> <dbl>
# 1 a xxx 2017-01-01 76 55.93109
# 2 a xxx 2017-01-02 87 57.92577
# 3 a xxx 2017-01-03 87 51.92263
# 4 a xxx 2017-01-04 56 51.86267
# 5 a xxx 2017-01-05 83 54.04588
# 6 a xxx 2017-01-06 17 52.75289
# 7 a xxx 2017-01-07 19 52.35083
# 8 a xxx 2017-01-08 72 53.91887
# 9 a xxx 2017-01-09 92 55.81202
# 10 a xxx 2017-01-10 35 49.78302
# # ... with 4,370 more rows
我正在寻找此请求中的两个具体帮助点 1) 如何根据下面的数据库 (all.df) 创建列表列表 2) 如何在这个 list
列表上向量化一个函数我正在尝试使用 Prophet 库生成客户/产品级别的预测。 我正在努力将操作矢量化。 我目前 运行 一个 for 循环,我想避免它并加快我的计算速度。
分析数据
set.seed(1123)
df1 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "a",
Product = "xxx",
Revenue = sample(1:100, 365*2, replace=TRUE))
df2 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "a",
Product = "yyy",
Revenue = sample(25:200, 365*2, replace=TRUE))
df3 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "b",
Product = "xxx",
Revenue = sample(1:100, 365*2, replace=TRUE))
df4 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "b",
Product = "yyy",
Revenue = sample(25:200, 365*2, replace=TRUE) )
all.df <- rbind(df1, df2, df3, df4)
这是我的预测函数
daily_forecast <- function(df, forecast.days = 365){
# fit actuals into prophet
m <- prophet(df,
yearly.seasonality = TRUE,
weekly.seasonality = TRUE,
changepoint.prior.scale = 0.55) # default value is 0.05
# create dummy data frame to hold prodictions
future <- make_future_dataframe(m, periods = forecast.days, freq = "day")
# run the prediction
forecast <- predict(m, future)
### Select the date and forecast from the model and then merge with actuals
daily_fcast <- forecast %>% select(ds, yhat) %>% dplyr::rename(Date = ds, fcast.daily = yhat)
actual.to.merge <- df %>% dplyr::rename(Date = ds, Actual.Revenue = y)
daily_fcast <- merge(actual.to.merge, daily_fcast, all = TRUE)
}
目前,我使用 for 循环
一次完成一个 customer/productx <- df1 %>% select(-c(Customer, Product)) %>%
dplyr::rename(ds = Date, y = Revenue) %>%
daily_forecast()
我想改为矢量化整个操作:
1-创建列表列表,即将 all.df 拆分为:
a) 产品然后
b) 按客户
2-然后让 daily_forecast 函数映射到上面 1) 中创建的列表列表
我非常想使用 purrr
中的函数。
以下是我如何使用 purrr
来完成您的要求:
library(tidyverse)
library(lubridate)
library(prophet)
res <-
all.df %>%
split(.$Customer) %>%
map(~ split(.x, .x$Product)) %>%
at_depth(2, select, ds = Date, y = Revenue) %>%
at_depth(2, daily_forecast)
str(res)
# List of 2
# $ a:List of 2
# ..$ xxx:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 76 87 87 56 83 17 19 72 92 35 ...
# .. ..$ fcast.daily : num [1:1095] 55.9 57.9 51.9 51.9 54 ...
# ..$ yyy:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 62 87 175 186 168 190 30 192 119 170 ...
# .. ..$ fcast.daily : num [1:1095] 121 121 119 119 116 ...
# $ b:List of 2
# ..$ xxx:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 71 94 81 32 85 59 59 55 50 50 ...
# .. ..$ fcast.daily : num [1:1095] 51.9 54.2 54.5 53.1 51.9 ...
# ..$ yyy:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 105 46 153 136 59 59 34 72 70 85 ...
# .. ..$ fcast.daily : num [1:1095] 103.3 103.3 103.1 103.1 91.5 ...
但以下对我来说更自然(将所有内容保存在数据框中):
res_2 <-
all.df %>%
rename(ds = Date, y = Revenue) %>%
nest(ds, y) %>%
transmute(Customer, Product, res = map(data, daily_forecast)) %>%
unnest()
# # A tibble: 4,380 × 5
# Customer Product Date Actual.Revenue fcast.daily
# <fctr> <fctr> <date> <int> <dbl>
# 1 a xxx 2017-01-01 76 55.93109
# 2 a xxx 2017-01-02 87 57.92577
# 3 a xxx 2017-01-03 87 51.92263
# 4 a xxx 2017-01-04 56 51.86267
# 5 a xxx 2017-01-05 83 54.04588
# 6 a xxx 2017-01-06 17 52.75289
# 7 a xxx 2017-01-07 19 52.35083
# 8 a xxx 2017-01-08 72 53.91887
# 9 a xxx 2017-01-09 92 55.81202
# 10 a xxx 2017-01-10 35 49.78302
# # ... with 4,370 more rows