从文件夹名称中包含日期的文件夹数组中选择最新的文件夹

Pick the latest folder from Array of folders where folder name has the date

我有下面的文件夹路径列表

var allLeafDirPaths= Array(
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200306/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200318/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200319/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200504/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20201020/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20210302/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220215/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220216/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220223/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
)

我想选择最新的文件夹,您可以看到它应该在 01-Mar-2022

上生成的文件夹下方
abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"

我试过下面有效的代码。但我的代码可能不是那么好。我们可以用任何其他更好的方式来做到这一点吗?

for(path <- allLeafDirPaths){  
  pathDatesString +:= path.substring(path.substring(0,path.lastIndexOf("/")).lastIndexOf("/")+1,path.length()-1)
  pathDatesInt = pathDatesString.map(_.toInt)
  maxPathDatesInt = pathDatesInt.reduceLeft(_ max _)
  
  if(path.contains("/".concat(maxPathDatesInt.toString).concat("/"))){
     finalPath = path
  }
}
finalPathArray = Array("")
finalPathArray +:= finalPath
println("final path is")
println(finalPathArray.mkString("\n"))

使用 maxBy or maxByOption 和正则表达式代替 indexOf 东西。

val DatePattern = ".*/(\d+)/$".r
def pathToDate(path: String): Option[Int] = path match {
  case DatePattern(rawDate) => Some(rawDate.toInt)
  case _ => None
}

allLeafDirPaths.maxBy(pathToDate)
// returns "abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"

说明

maxBy 根据“测量函数”从序列中找到最大值,例如你想通过日期值来测量你的路径,所以你提供 pathToDate 方法作为你的测量函数。

Scala 在 String 上提供了一个方便的 Regex class and a convenience .r 方法(通过 StringOps 隐含地),使您可以从 String 模式构造 Regex。 Regex 实现了 unapplySeq(s: CharSequence),这让它可以作为 match/case 块中的“提取器对象”。