拆分开始时间重叠的大序列日志

split the large sequence of logs with start time being overlapped

我想将大量日志拆分成更小的序列,但日志的开始时间重叠。

例如假设我们有

largeLogs = {
[startTime=A, duration=22],
[startTime=B, duration=12],
[startTime=C, duration=34],
[startTime=D, duration=12],
[startTime=E, duration=18],
[startTime=F, duration=8]
}

请求输出应该是:

{[[startTime=A, duration=22],
[startTime=B, duration=12],
[startTime=C, duration=34]],

[[startTime=B, duration=12],
[startTime=C, duration=18],
[startTime=D, duration=8]],

[[startTime=c, duration=12],
[startTime=D, duration=18],
[startTime=E, duration=8]]}

我在python中写过如下

def split_func(batchSize, logs):
    batchSize = min(batchSize, len(logs)-1)
    return [logs[i:i+b4] for i in range(len(logs) - batchSize+1)]

因为我是 scala 的新手,所以我尝试按以下方式编写,但我遇到并卡在了最后一行

def split_func(batchSize:Int, partialLogs: ListBuffer[Array[Byte]] ) : ListBuffer[Array[Byte]] = {

    batchSize = Math.min(batchSize, partialLogs.size - 1) // getting error reassignment to val

    val i = 0 to partialLogs.size - batchSize+1

    return [lst[i:i+n] // no idea how to change this line from python to scala

有一个名为 sliding 的 Scala 方法可以执行您想要的操作:

partialLogs.sliding(batchSize, batchSize-overlapSize)

第一个参数是每个块的大小,第二个参数是每个块开始之间的间隔。