从文件夹名称中包含日期的文件夹数组中选择最新的文件夹
Pick the latest folder from Array of folders where folder name has the date
我有下面的文件夹路径列表
var allLeafDirPaths= Array(
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200306/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200318/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200319/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200504/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20201020/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20210302/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220215/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220216/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220223/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
)
我想选择最新的文件夹,您可以看到它应该在 01-Mar-2022
上生成的文件夹下方
abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
我试过下面有效的代码。但我的代码可能不是那么好。我们可以用任何其他更好的方式来做到这一点吗?
for(path <- allLeafDirPaths){
pathDatesString +:= path.substring(path.substring(0,path.lastIndexOf("/")).lastIndexOf("/")+1,path.length()-1)
pathDatesInt = pathDatesString.map(_.toInt)
maxPathDatesInt = pathDatesInt.reduceLeft(_ max _)
if(path.contains("/".concat(maxPathDatesInt.toString).concat("/"))){
finalPath = path
}
}
finalPathArray = Array("")
finalPathArray +:= finalPath
println("final path is")
println(finalPathArray.mkString("\n"))
使用 maxBy
or maxByOption
和正则表达式代替 indexOf
东西。
val DatePattern = ".*/(\d+)/$".r
def pathToDate(path: String): Option[Int] = path match {
case DatePattern(rawDate) => Some(rawDate.toInt)
case _ => None
}
allLeafDirPaths.maxBy(pathToDate)
// returns "abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
说明
maxBy
根据“测量函数”从序列中找到最大值,例如你想通过日期值来测量你的路径,所以你提供 pathToDate
方法作为你的测量函数。
Scala 在 String 上提供了一个方便的 Regex
class and a convenience .r
方法(通过 StringOps 隐含地),使您可以从 String 模式构造 Regex。 Regex 实现了 unapplySeq(s: CharSequence)
,这让它可以作为 match/case
块中的“提取器对象”。
我有下面的文件夹路径列表
var allLeafDirPaths= Array(
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200306/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200318/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200319/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20200504/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20201020/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20210302/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220215/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220216/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220223/",
"abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
)
我想选择最新的文件夹,您可以看到它应该在 01-Mar-2022
abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
我试过下面有效的代码。但我的代码可能不是那么好。我们可以用任何其他更好的方式来做到这一点吗?
for(path <- allLeafDirPaths){
pathDatesString +:= path.substring(path.substring(0,path.lastIndexOf("/")).lastIndexOf("/")+1,path.length()-1)
pathDatesInt = pathDatesString.map(_.toInt)
maxPathDatesInt = pathDatesInt.reduceLeft(_ max _)
if(path.contains("/".concat(maxPathDatesInt.toString).concat("/"))){
finalPath = path
}
}
finalPathArray = Array("")
finalPathArray +:= finalPath
println("final path is")
println(finalPathArray.mkString("\n"))
使用 maxBy
or maxByOption
和正则表达式代替 indexOf
东西。
val DatePattern = ".*/(\d+)/$".r
def pathToDate(path: String): Option[Int] = path match {
case DatePattern(rawDate) => Some(rawDate.toInt)
case _ => None
}
allLeafDirPaths.maxBy(pathToDate)
// returns "abfss://cont@mystorage.dfs.core.windows.net/customer/Full/20220301/"
说明
maxBy
根据“测量函数”从序列中找到最大值,例如你想通过日期值来测量你的路径,所以你提供 pathToDate
方法作为你的测量函数。
Scala 在 String 上提供了一个方便的 Regex
class and a convenience .r
方法(通过 StringOps 隐含地),使您可以从 String 模式构造 Regex。 Regex 实现了 unapplySeq(s: CharSequence)
,这让它可以作为 match/case
块中的“提取器对象”。