如何提取文件名中的日期并对其进行排序以查找最新文件？

Question

我目前在一个文件夹中有几个文件。它包含库存的每日更新。它看起来像这样。

Onhand Harian 12 Juli 2019.xlsx
Onhand Harian 13 Juli 2019.xlsx
Onhand Harian 14 Juli 2019.xlsx... and so on.

我只想阅读最新的 excel 文件，使用文件名上的日期。如何做到这一点？提前谢谢

Answer 1

如果您的所有文件都包含相同的名称，您可以这样做

#List all the file names in the folder
file_names <- list.files("/path/to/folder/", full.names = TRUE)

#Remove all unwanted characters and keep only the date
#Convert the date string to actual Date object
#Sort them and take the latest file
file_to_read <- file_names[order(as.Date(sub("Onhand Harian ", "", 
       sub(".xlsx$", "", basename(file_names))), "%d %B %Y"), decreasing = TRUE)[1]]

显然，如果您的文件每天都生成，您也可以使用 file.info 根据创建或修改时间来 select 它们？ the post 中的详细信息。

Answer 2

我会做类似的事情：

library(stringr)
library(tidyverse)

x <- c("Onhand Harian 12 Juli 2019.xlsx",
       "Onhand Harian 13 Juli 2019.xlsx",
       "Onhand Harian 14 Juli 2019.xlsx")

lookup <- set_names(seq_len(12),
                    c("Januar", "Februar", "März", "April", "Mai", "Juni", "Juli",
                      "August", "September", "Oktober", "November", "Dezember"))

enframe(x, name = NULL, value = "txt") %>%
  mutate(txt_extract = str_extract(txt, "\d{1,2} \D{3,9} \d{4}")) %>% # September is longest ..
  separate(txt_extract, c("d", "m", "y"), remove = FALSE) %>%
  mutate(m = sprintf("%02d", lookup[m]),
         d = sprintf("%02d", as.integer(d))) %>%
  mutate(date = as.Date(str_c(y, m, d), format = "%Y%m%d")) %>%
  filter(date == max(date)) %>%
  pull(txt) 
#  "Onhand Harian 14 Juli 2019.xlsx"

如何提取文件名中的日期并对其进行排序以查找最新文件？

how to extract date in file name and sort it to find the latest file?

r

date

xlsx