是否有一个 R 函数允许我根据相同的字符将列表分组在一起,但如果有子目录,它将把它们分开

Is there an R function that allows me to group a list together based off identical characters but that will separate them if there is a subdirectory

我的目标是制作一个 div 包含所有 individual 文件,这些文件由各自的日期分隔。我在那个问题的答案中使用的内容对我有用,但是,我遇到了一个问题,因为有些文件的日期相同,但它们也出现了。它们位于名为“FailedToProcess”的单独子目录中,并显示在相同的 div 中,并且还会显示这些链接。是否可以区分两者,因为“FailedToProcess”子目录下的那些也没有显示,而是显示在另一个单独的 div 中?谢谢!

这是我的代码:

# create a vector of unique date ranges
(date_range_unique_vec <- str_sub(fname, start = 7, end = 23) %>% 
    unique())

for (each_date_range in date_range_unique_vec) {
  
  # extract group of file names for each unique date range
  group_fnames <- files[str_detect(files, each_date_range)]
  
  {
    html_block <- make_div(group_fnames, each_date_range)
    top <- readLines("header.html")
    bottom <- readLines("footer.html")
    
    # This will write just the div block
    write(x = html_block, file = paste0(each_date_range, "-block.html"))
    
    # This will write a working website
    write(x = c(top, "<body>", html_block, "</body>", bottom),
          file = paste0(each_date_range, "-website.html"))
    
  }  
  
  cat(each_date_range, "\n")
  cat(group_fnames, "\n")
  cat("\n")
}

编辑:

files <- list.files(recursive = TRUE)

file_name <- strsplit(files, "/")

# extract the file names themselves
fname <- unlist(lapply(file_name, FUN = function(x) { 
  if(length(x) == 2) { x[2] } else { x[3] } }))

实现您的目标的一种方法如下。

与您之前的主要区别在于,在循环内对 html 写入函数进行了两次调用,首先是针对已处理的文件,然后是针对未处理的文件。 length()>0 条件只是为了确保您仅在每个日期范围内确实存在已处理或未处理的文件时才尝试写入 html 文件。

library(dplyr)
library(stringr)

# representative sample of files
files <- c("1858/FailedToProcess/TOR-D-18580907-18580908.tif", 
                     "1858/FailedToProcess/TOR-D-18580907-18580908.tif-FailToProcess-Data.RDS",
                     "1858/FailedToProcess/TOR-D-18580907-18580908.tif-FailToProcess-Plot.png",
                     "1858/FailedToProcess/TOR-D-18580907-18580908.tif.png",
                     "1858/FailedToProcess/TOR-D-18580908-18580909.tif",                  
                     "1858/FailedToProcess/TOR-D-18580908-18580909.tif-FailToProcess-Data.RDS",
                     "1858/FailedToProcess/TOR-D-18580908-18580909.tif-FailToProcess-Plot.png",
                     "1858/FailedToProcess/TOR-D-18580908-18580909.tif.png",
                     "1858/FailedToProcess/TOR-D-18580910-18580911.tif",                       
                     "1858/FailedToProcess/TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS",
                     "1858/TOR-D-18580910-18580911.tif",                       
                     "1858/TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS",
                     "1939/AGC-D-19390310-19390312.tif", 
                     "1939/AGC-D-19390310-19390312.tif.png",
                     "1939/AGC-D-19390310-19390312.tif-FailToProcess-Data.RDS",
                     "1940/A06-D-19400306-19400306.tif", 
                     "1940/A06-D-19400306-19400306.tif.png",
                     "1940/A06-D-19400306-19400306.tif-FailToProcess-Data.RDS",
                     "1941/A02-D-19410302-19410302.tif", 
                     "1941/A02-D-19410302-19410302.tif.png",
                     "1941/A02-D-19410302-19410302.tif-FailToProcess-Data.RDS")

# you can get the file name (without full path) with basename()
fname <- basename(files)

# create a vector of unique date ranges from processed files
(date_ranges_vec <- str_sub(basename(files), start = 7, end = 23) %>% 
        unique())

# since it does not change for all datse, can go outside of loop for speed
top <- readLines("header.html")
bottom <- readLines("footer.html")

for (each_date_range in date_ranges_vec) {
    
    # extract group of file names for each unique date range. Processed first
    group_fnames <- files[str_detect(files, each_date_range) & !str_detect(files, "/FailedToProcess/")]
    
    # check that there is at least one file that respects above conditions
    if (length(group_fnames)>0) {
        cat(each_date_range, ": Writing processed.", "\n")
        html_block <- make_div(group_fnames, each_date_range)
        write(x = html_block, file = paste0(each_date_range, "-block.html"))
        write(x = c(top, "<body>", html_block, "</body>", bottom),
                    file = paste0(each_date_range, "-website.html"))
    }
    
    # extract group of file names for each unique date range. Not procesed
    group_fnames_fail <- files[str_detect(files, each_date_range) & str_detect(files, "/FailedToProcess/")]
    
    # check that there is at least one file that respects above conditions
    if (length(group_fnames_fail)>0) {
        cat(each_date_range, ": Writing not processed.", "\n")
        html_block <- make_div(group_fnames_fail, each_date_range)
        write(x = html_block, file = paste0(each_date_range, "-block.html"))
        write(x = c(top, "<body>", html_block, "</body>", bottom),
                    file = paste0(each_date_range, "-website.html"))
    }
}

18580907-18580908 : Writing not processed. 
18580908-18580909 : Writing not processed. 
18580910-18580911 : Writing processed. 
18580910-18580911 : Writing not processed. 
19390310-19390312 : Writing processed. 
19400306-19400306 : Writing processed. 
19410302-19410302 : Writing processed. 

在此设置中,divs 将按日期排序。如果你想先处理所有已处理的文件,然后再处理所有未处理的文件,你可以通过 运行 循环来完成,一个只处理已处理的文件,另一个处理未处理的文件。