通过 sapply 或 lapply 函数而不是 R 中的 for 循环向下复制多个数据框行

Copy multiple data frame rows down through sapply or lapply function instead of for loop in R

我需要遍历仓库项目数据并将该数据重复粘贴到特定月份。在我的真实世界应用程序中,我正在处理 500k 行数据,我的函数需要 5 分钟才能达到 运行,这是不切实际的。

我需要一种方法来用某种 dplyr apply 函数做同样的事情,最好是 sapply 或任何可以输出数据帧的东西。以下是向您展示概念的示例数据:

library(lubridate)  

# Item Data Frame
item.df <- data.frame(Item = c("A1","A2","A3","A4","A5"), 
        Gross_Profit = c(15,20,8,18,29),
        Launch_Date = c("2001-04-01","2001-04-05","2003-11-03","2015-02-
11","2017-06-15"))

# Months Data Frame
five.months <- seq(ymd(paste(year(today()),month(today()),1))-months(5),
                   ymd(paste(year(today()),month(today()),1))-months(1), 
                   by = "month")
five.months.df <- data.frame(Month_Floor = five.months)

# Function to copy Item Data for each Month
repeat.item <- function(char.item,frame.months){
               df.item = NULL

               for(i in 1:nrow(char.item)){
                  Item <- rep(char.item[i,1],nrow(frame.months))
                  Launch_Date <- rep(char.item[i,3],nrow(frame.months))
                  df.col = frame.months
                  df.col = cbind(df.col,Item, Launch_Date)    
                  df.item <- rbind(df.item, df.col) 
                  }  

               return(df.item)
               }
# Result
copied.df <- repeat.item(item.df,five.months.df)

变量结果如下:

> item.df
Item Gross_Profit Launch_Date
1   A1           15  2001-04-01
2   A2           20  2001-04-05
3   A3            8  2003-11-03
4   A4           18  2015-02-11
5   A5           29  2017-06-15

> five.months.df
Month_Floor
1  2017-03-01
2  2017-04-01
3  2017-05-01
4  2017-06-01
5  2017-07-01

> copied.df
Month_Floor Item Launch_Date
1   2017-03-01   A1  2001-04-01
2   2017-04-01   A1  2001-04-01
3   2017-05-01   A1  2001-04-01
4   2017-06-01   A1  2001-04-01
5   2017-07-01   A1  2001-04-01
6   2017-03-01   A2  2001-04-05
7   2017-04-01   A2  2001-04-05
8   2017-05-01   A2  2001-04-05
9   2017-06-01   A2  2001-04-05
10  2017-07-01   A2  2001-04-05
11  2017-03-01   A3  2003-11-03
12  2017-04-01   A3  2003-11-03
13  2017-05-01   A3  2003-11-03
14  2017-06-01   A3  2003-11-03
15  2017-07-01   A3  2003-11-03
16  2017-03-01   A4  2015-02-11
17  2017-04-01   A4  2015-02-11
18  2017-05-01   A4  2015-02-11
19  2017-06-01   A4  2015-02-11
20  2017-07-01   A4  2015-02-11
21  2017-03-01   A5  2017-06-15
22  2017-04-01   A5  2017-06-15
23  2017-05-01   A5  2017-06-15
24  2017-06-01   A5  2017-06-15
25  2017-07-01   A5  2017-06-15

我想你可以使用内置的merge函数:

copied.df = merge(five.months.df, item.df, by=NULL);

它实现了两个数据帧之间的交叉连接。如果您不需要所有列(如您的示例所示),您可以在交叉连接之前使用 subset(这会提高性能)

copied.df = merge(five.months.df, subset(item.df, select=c("Item", "Launch_Date")), by=NULL);