如何用开始和结束绑定许多 xlsx 文件
How to rbind many xlsx files with start and end
我在一个地方有几个文件。我这样合并它们:
punkty_pgd <- read_xlsx("C:/Users/Desktop/PC- 2021042500.xlsx", sheet = "NEW")
punkty_pgd_2 <- read_xlsx("C:/Users/Desktop/PC- 2021042512.xlsx", sheet = "NEW")
punkty_pgd_3 <- read_xlsx("C:/Users/Desktop/PC- 2021042600.xlsx", sheet = "NEW")
punkty_pgd <- rbind(punkty_pgd, punkty_pgd_2, punkty_pgd_3)
他们的名字中有日期,我想以某种方式限制这种合并,也就是说,只合并名称中有日期范围的文件:从 date_start 到 date_end
对于“加载许多文件并处理它们”,一个好的起点是 。
缺少它,让我们解决 filename/date 问题。
首先,假设有一些文件您不想加载(这次),那么我们应该首先获取文件的名称。
files <- list.files("c:/Users/Desktop", pattern = "\.xlsx$", full.names = TRUE)
### add a fake file without a date, for testing
files <- c(files, "Something.xlsx")
files
# [1] "C:/Users/Desktop/PC- 2021042500.xlsx"
# [2] "C:/Users/Desktop/PC- 2021042512.xlsx"
# [3] "C:/Users/Desktop/PC- 2021042600.xlsx"
# [4] "Something.xlsx"
现在我们可以通过解析文件名来确定 Date
是什么。
files <- cbind(strcapture("\b([0-9]{10})\b", files, list(fn = "")), origfn = files)
files
# fn origfn
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx
# 4 <NA> Something.xlsx
(strcapture
returns a data.frame
,为了方便起见,我会把它放在那里。)
files$date <- as.Date(files$fn, format = "%Y%m%d%H")
files
# fn origfn date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx 2021-04-26
# 4 <NA> Something.xlsx <NA>
现在我们可以只筛选特定日期范围内所需的文件。
基础 R:
dates <- as.Date(c("2021-04-20", "2021-04-25"))
loadthese <- subset(files, dates[1] <= date & date <= dates[2])
loadthese
# fn origfn date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
list_of_frames <- lapply(loadthese$origfn, readxl::read_xlsx, sheet = "NEW")
punkty_pgd <- do.call(rbind, list_of_frames)
整洁宇宙
dates <- as.Date(c("2021-04-20", "2021-04-25"))
punkty_pgd <- files %>%
filter(between(date, dates[1], dates[2])) %>%
pull(origfn) %>%
map_dfr(~ read_xlsx(., sheet = "NEW"))
我在一个地方有几个文件。我这样合并它们:
punkty_pgd <- read_xlsx("C:/Users/Desktop/PC- 2021042500.xlsx", sheet = "NEW")
punkty_pgd_2 <- read_xlsx("C:/Users/Desktop/PC- 2021042512.xlsx", sheet = "NEW")
punkty_pgd_3 <- read_xlsx("C:/Users/Desktop/PC- 2021042600.xlsx", sheet = "NEW")
punkty_pgd <- rbind(punkty_pgd, punkty_pgd_2, punkty_pgd_3)
他们的名字中有日期,我想以某种方式限制这种合并,也就是说,只合并名称中有日期范围的文件:从 date_start 到 date_end
对于“加载许多文件并处理它们”,一个好的起点是 。
缺少它,让我们解决 filename/date 问题。
首先,假设有一些文件您不想加载(这次),那么我们应该首先获取文件的名称。
files <- list.files("c:/Users/Desktop", pattern = "\.xlsx$", full.names = TRUE)
### add a fake file without a date, for testing
files <- c(files, "Something.xlsx")
files
# [1] "C:/Users/Desktop/PC- 2021042500.xlsx"
# [2] "C:/Users/Desktop/PC- 2021042512.xlsx"
# [3] "C:/Users/Desktop/PC- 2021042600.xlsx"
# [4] "Something.xlsx"
现在我们可以通过解析文件名来确定 Date
是什么。
files <- cbind(strcapture("\b([0-9]{10})\b", files, list(fn = "")), origfn = files)
files
# fn origfn
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx
# 4 <NA> Something.xlsx
(strcapture
returns a data.frame
,为了方便起见,我会把它放在那里。)
files$date <- as.Date(files$fn, format = "%Y%m%d%H")
files
# fn origfn date
# 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25
# 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25
# 3 2021042600 C:/Users/Desktop/PC- 2021042600.xlsx 2021-04-26
# 4 <NA> Something.xlsx <NA>
现在我们可以只筛选特定日期范围内所需的文件。
基础 R:
dates <- as.Date(c("2021-04-20", "2021-04-25")) loadthese <- subset(files, dates[1] <= date & date <= dates[2]) loadthese # fn origfn date # 1 2021042500 C:/Users/Desktop/PC- 2021042500.xlsx 2021-04-25 # 2 2021042512 C:/Users/Desktop/PC- 2021042512.xlsx 2021-04-25 list_of_frames <- lapply(loadthese$origfn, readxl::read_xlsx, sheet = "NEW") punkty_pgd <- do.call(rbind, list_of_frames)
整洁宇宙
dates <- as.Date(c("2021-04-20", "2021-04-25")) punkty_pgd <- files %>% filter(between(date, dates[1], dates[2])) %>% pull(origfn) %>% map_dfr(~ read_xlsx(., sheet = "NEW"))