如何使用 lapply 加速 for 循环？

Question

我编写了一个 lapply 函数，以便将特定日期前后的股票价格分配给特定公司。我要为其指定股票价格的所有公司都在数据集“peers_per_deal_germany”中。

我的第一步是根据日期和公司特定的 ISIN 识别我的 Stock_Prices 数据集中是否有这样的公司。如果“是”，我定义了我想要获取股票数据的特定时间范围。然后我进一步定义了一些限制，例如向量中的 NA 不能超过 40，因为这会干扰我的结果。

代码运行良好。然而，对于约 15,000 家公司而言，代码处理时间大约为 1 小时。我的完整数据集包含大约 1.8 Mio。我需要其股票价格的公司。

有什么方法可以加快此 lapply 功能的速度吗？非常感谢您的帮助。

get_return_vector_germany <- function(idx, peer_company, SIC, ISIN,
                                      deal, announcement, peer_country) {
  peer <- peer_company[idx]
  SIC <- SIC[idx]
  Deal_Nr <- deal[idx]
  company_ticker <- ISIN[idx]
  announcement_date <- announcement[idx]
  peer_country <- peer_country[idx]
  row <- c()
  vector_stock_prices <- c()
  vector_stock_return <- c()
  vector_stock_prices_event <- c()
  vector_stock_return_event <- c()
  
  if (length(which(Stock_Prices_DE$datadate == announcement_date &
                   Stock_Prices_DE$isin == company_ticker, arr.ind = TRUE)) ==
      0) {
    row <- NA
  } else {
    row <- which(Stock_Prices_DE$datadate == announcement_date &
                   Stock_Prices_DE$isin == company_ticker, arr.ind = TRUE)
  }
  
  if (sum(is.na(row) == 1)) {
    vector_stock_prices <- rep(NA, times = 179)
  } else {
    vector_stock_prices <- Stock_Prices_DE[(row - 218):(row - 39),
                                           7]
  }
  
  if (sum(is.na(vector_stock_prices)) > 40) {
    vector_stock_return <- list(rep(NA, times = 179))
  } else {
    vector_stock_return <- list(diff(vector_stock_prices)/
                                  vector_stock_prices[-length(vector_stock_prices)])
  }
  
  if (sum(is.na(row) == 1)) {
    vector_stock_prices_event <- rep(NA, times = 22)
  } else {
    vector_stock_prices_event <- Stock_Prices_DE[(row - 11):(row +
                                                               10), 7]
  }
  
  if (sum(is.na(vector_stock_prices_event)) > 0) {
    vector_stock_return_event <- list(rep(NA, times = 21))
  } else {
    vector_stock_return_event <- list(diff(vector_stock_prices_event)/
                                        vector_stock_prices_event[-length(vector_stock_prices_event)])
  }
  
  vector <- data.frame(cbind(peer, Deal_Nr, SIC, peer_country, vector_stock_return,
                             vector_stock_return_event))
  return(vector)
}


results_vector_germany <- lapply(1:nrow(peers_per_deal_germany), get_return_vector_germany, peers_per_deal_germany$peer_company, peers_per_deal_germany$current_SIC, peers_per_deal_germany$ISIN_code, peers_per_deal_germany$deal_nr, peers_per_deal_germany$current_announcement, peers_per_deal_germany$peer_country)

Answer 1

尝试使用 mclapply 并行完成任务。

如何使用 lapply 加速 for 循环？

How to speed up a for loop using lapply?

performance

r

coding-efficiency

lapply