如何用开始和结束绑定许多 xlsx 文件

How to rbind many xlsx files with start and end

我在一个地方有几个文件。我这样合并它们:

punkty_pgd <- read_xlsx("C:/Users/Desktop/PC- 2021042500.xlsx", sheet = "NEW")
punkty_pgd_2 <- read_xlsx("C:/Users/Desktop/PC- 2021042512.xlsx", sheet = "NEW")
punkty_pgd_3 <- read_xlsx("C:/Users/Desktop/PC- 2021042600.xlsx", sheet = "NEW")
punkty_pgd <- rbind(punkty_pgd, punkty_pgd_2, punkty_pgd_3)

他们的名字中有日期,我想以某种方式限制这种合并,也就是说,只合并名称中有日期范围的文件:从 date_start 到 date_end

对于“加载许多文件并处理它们”,一个好的起点是 。

缺少它,让我们解决 filename/date 问题。

首先,假设有一些文件您不想加载(这次),那么我们应该首先获取文件的名称。

files <- list.files("c:/Users/Desktop", pattern = "\.xlsx$", full.names = TRUE)
### add a fake file without a date, for testing
files <- c(files, "Something.xlsx")

files
# [1] "C:/Users/Desktop/PC- 2021042500.xlsx"
# [2] "C:/Users/Desktop/PC- 2021042512.xlsx"
# [3] "C:/Users/Desktop/PC- 2021042600.xlsx"
# [4] "Something.xlsx"                      

现在我们可以通过解析文件名来确定 Date 是什么。

files <- cbind(strcapture("\b([0-9]{10})\b", files, list(fn = "")), origfn = files)
files
#           fn                               origfn
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx
# 4       <NA>                       Something.xlsx

(strcapture returns a data.frame,为了方便起见,我会把它放在那里。)

files$date <- as.Date(files$fn, format = "%Y%m%d%H")
files
#           fn                               origfn       date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx 2021-04-26
# 4       <NA>                       Something.xlsx       <NA>

现在我们可以只筛选特定日期范围内所需的文件。

  • 基础 R:

    dates <- as.Date(c("2021-04-20", "2021-04-25"))
    loadthese <- subset(files, dates[1] <= date & date <= dates[2])
    loadthese
    #           fn                               origfn       date
    # 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
    # 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
    list_of_frames <- lapply(loadthese$origfn, readxl::read_xlsx, sheet = "NEW")
    punkty_pgd <- do.call(rbind, list_of_frames)
    
  • 整洁宇宙

    dates <- as.Date(c("2021-04-20", "2021-04-25"))
    punkty_pgd <- files %>%
      filter(between(date, dates[1], dates[2])) %>%
      pull(origfn) %>%
      map_dfr(~ read_xlsx(., sheet = "NEW"))