以最快的速度和最少的代码量为数据框创建 100 多个新列的最佳实践是什么？

Question

我想对 100 多列进行 lag 和 cummean。我更喜欢使用数据框将参数添加到函数中。我已经尝试使用 dplyr 进行惰性评估，但是当使用 mapply 执行函数时它失败了，其中数据框的列作为参数。我可以做 base R，但担心它可能会降低速度，尤其是在 700 个变量 60,000 行的数据帧上。

在数据帧之前

date        name    team     score1 score2 height
1/1/2001     Bill   eagles     1      2     5
1/1/2001    George  eagles     2      7     2
1/1/2001    Aaron   eagles     1      2     4
1/2/2001     Bill   eagles     1      2     5
1/2/2001    George  eagles     2      4     2
1/2/2001    Aaron   eagles     2      2     4
1/3/2001     Bill   eagles     2      3     5
1/3/2001    George  eagles     2      7     2
1/3/2001    Aaron   eagles     1      2     4

数据帧后

date        name    team     score1 score2 height  score1_avg height_average
1/1/2001     Bill   eagles     1      2     5          NA           NA 
1/1/2001    George  eagles     2      7     2          NA           NA 
1/1/2001    Aaron   eagles     1      2     4          NA           NA 
1/2/2001     Bill   eagles     1      2     5          1.33         3.66
1/2/2001    George  eagles     2      4     2          1.33         3.66
1/2/2001    Aaron   eagles     2      2     4          1.33         3.66
1/3/2001     Bill   eagles     2      3     5          1.5          3.66 
1/3/2001    George  eagles     2      7     2          1.5          3.66 
1/3/2001    Aaron   eagles     1      2     4          1.5          3.66

这是我为一个专栏所做的，但我需要它可扩展 100 秒

 df %>%
  group_by(team) %>%
  mutate(score1_avg = lag(cummean((score1))))

Answer 1

我们可以使用 data.table 通过分配 (:=) 到位而无需复制

library(data.table)
setDT(df)[, paste0(names(df)[4:6], "avg") := lapply(.SD, function(x) 
              shift(cummean(x))[[1]]), team, .SDcols = score1:height]

以最快的速度和最少的代码量为数据框创建 100 多个新列的最佳实践是什么？

What is the best practice for creating a 100s of new columns to a dataframe at the highest speed and least amount of code?

r

dataframe

mapply

dplyr