是否有一个 R 函数允许我根据相同的字符将列表分组在一起

Is there an R function that allows me to group a list together based off identical characters

i <- 1 
while (i <= length(files)) {
  start <- i 
  end <- start + 3
  
  v <- files[start:end]
  y <- fname[start:end]
  date_range <- substr(y[1], 7, 23)
  html_block <- make_div(v, date_range)
  
  top <- readLines("header.html")
  bottom <- readLines("footer.html")
  
  
  # This will write just the div block
  write(x = html_block, file = paste0(date_range, "-block.html"))
  
  # This will write a working website
  write(x = c(top, "<body>", html_block, "</body>", bottom), 
        file = paste0(date_range, "-website.html"))
  
  i <- i + 4
}

所以,这个 将提供我如何到达这里的参考。上面的代码实际上正在做我想要的。目标是循环遍历我拥有的冗长文件列表,然后为每个文件创建一个 div,唯一的区别是文件类型(即文件名中的其他所有内容都相同)。然而,这不是它正在做的,因为它只是将每 4 个文件放在一起。这个问题是不是每 4 个文件都是同一个文件(例如,有些只是 2 个或 3 个;例如,请参见下面的 fname)。所以我想知道的是有没有办法循环遍历每个文件名,看看从x到y的字符是否相同,如果是,则可以将它们组合在一起,然后依此类推。

# fname[1:10]
fname <- c("TOR-D-18580907-18580908.tif",                       
"TOR-D-18580907-18580908.tif-FailToProcess-Data.RDS",
"TOR-D-18580907-18580908.tif-FailToProcess-Plot.png",
"TOR-D-18580907-18580908.tif.png",                   
"TOR-D-18580908-18580909.tif",                       
"TOR-D-18580908-18580909.tif-FailToProcess-Data.RDS",
"TOR-D-18580908-18580909.tif-FailToProcess-Plot.png",
"TOR-D-18580908-18580909.tif.png",                   
"TOR-D-18580910-18580911.tif",                       
"TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS")

上面显示的是fname(一个变量,包含所有明显多于10个的文件名)的前10个元素。有没有办法让我检查第 7 到第 24 个元素(即 18580907-18580908 是日期)是否与以下元素匹配,如果是,是否可以将它们组合在一起,然后对后面的每个文件继续该循环?

您可以使用 substr() 提取日期,然后将它们分组到一个列表中,如果您需要的话。然后你会得到一个日期列表,每个日期都包含一个文件名向量:

date = substr(fname,7,23)

directory = list()
for(d in unique(date)){
  directory[[d]]=fname[date==d]
}

输出为:

> directory
$`18580907-18580908`
[1] "TOR-D-18580907-18580908.tif"                       
[2] "TOR-D-18580907-18580908.tif-FailToProcess-Data.RDS"
[3] "TOR-D-18580907-18580908.tif-FailToProcess-Plot.png"
[4] "TOR-D-18580907-18580908.tif.png"                   

$`18580908-18580909`
[1] "TOR-D-18580908-18580909.tif"                       
[2] "TOR-D-18580908-18580909.tif-FailToProcess-Data.RDS"
[3] "TOR-D-18580908-18580909.tif-FailToProcess-Plot.png"
[4] "TOR-D-18580908-18580909.tif.png"                   

$`18580910-18580911`
[1] "TOR-D-18580910-18580911.tif"                       
[2] "TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS"

编辑:我承认 r2evans 的回答更好。我忘记了 split() 函数。

两个想法:

  1. 如果你想要一个list,它的每个元素都是相关文件名的向量,那么

    groupedlist <- split(fname, substr(fname, 7, 23))
    groupedlist
    # $`18580907-18580908`
    # [1] "TOR-D-18580907-18580908.tif"                        "TOR-D-18580907-18580908.tif-FailToProcess-Data.RDS"
    # [3] "TOR-D-18580907-18580908.tif-FailToProcess-Plot.png" "TOR-D-18580907-18580908.tif.png"                   
    # $`18580908-18580909`
    # [1] "TOR-D-18580908-18580909.tif"                        "TOR-D-18580908-18580909.tif-FailToProcess-Data.RDS"
    # [3] "TOR-D-18580908-18580909.tif-FailToProcess-Plot.png" "TOR-D-18580908-18580909.tif.png"                   
    # $`18580910-18580911`
    # [1] "TOR-D-18580910-18580911.tif"                        "TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS"
    
  2. 如果你想要一个向量(也许添加到 data.frame 来标识他们属于哪个组,而你不想只使用 substr(.) 来分组子串,那么你可以用

    得到这些组的整数表示
    as.integer(factor(substr(fname, 7, 23)))
    #  [1] 1 1 1 1 2 2 2 2 3 3
    

另一种更类似于您已有的解决方案是创建一个唯一数据范围的向量,然后为每个数据范围提取文件组,然后在您的循环中继续执行任务:

library(stringr)
library(dplyr)
fname <- c("TOR-D-18580907-18580908.tif",                       
           "TOR-D-18580907-18580908.tif-FailToProcess-Data.RDS",
           "TOR-D-18580907-18580908.tif-FailToProcess-Plot.png",
           "TOR-D-18580907-18580908.tif.png",                   
           "TOR-D-18580908-18580909.tif",                       
           "TOR-D-18580908-18580909.tif-FailToProcess-Data.RDS",
           "TOR-D-18580908-18580909.tif-FailToProcess-Plot.png",
           "TOR-D-18580908-18580909.tif.png",                   
           "TOR-D-18580910-18580911.tif",                       
           "TOR-D-18580910-18580911.tif-FailToProcess-Data.RDS")

# create a vector of unique date ranges
(date_range_unique_vec <- str_sub(fname, start = 7, end = 23) %>% 
    unique())

for (each_date_range in date_range_unique_vec) {
  
  # extract group of file names for each unique date range
  group_fnames <- fname[str_detect(fname, each_date_range)]
  
  { # from here!!
    html_block <- make_div(group_fnames, each_date_range)
    top <- readLines("header.html")
    bottom <- readLines("footer.html")
    
    # This will write just the div block
    write(x = html_block, file = paste0(each_date_range, "-block.html"))
    
    # This will write a working website
    write(x = c(top, "<body>", html_block, "</body>", bottom),
          file = paste0(each_date_range, "-website.html"))
    
  }# to here, I cannot run locally.  
  

  # 
  cat(each_date_range, "\n")
  cat(group_fnames, "\n")
  cat("\n")
}

注意: 我无法在我的计算机中检查循环中间的可选 { } 块。虽然我觉得应该没问题,但也许你需要稍微调整一下。