Scala 的集合的 sliding() 在 window size 大于 step 时不一致
Scala's collection's sliding() is inconsistent when the window size is greater than step
这是 Scala 集合 API 的 sliding()
:
/** Groups elements in fixed size blocks by passing a "sliding window"
* over them (as opposed to partitioning them, as is done in grouped.)
* @see [[scala.collection.Iterator]], method `sliding`
*
* @param size the number of elements per group
* @param step the distance between the first elements of successive
* groups
* @return An iterator producing ${coll}s of size `size`, except the
* last and the only element will be truncated if there are
* fewer elements than size.
*/
def sliding(size: Int, step: Int): Iterator[Repr] =
一个简单的理解方式就是滑动就是(0 until this.length by step).map(i => slice(i, i + size))
。但是这种解释在 size > step
:
时不起作用
object SlidingTest extends App {
val n = 10
val r1 = 0 until n
val r2 = new Range(start = 0, end = n, step = 1) {
override def sliding(size: Int, step: Int) =
(indices by step).iterator.map(i => slice(i, i + size))
}
for {
i <- 1 to 2*n
j <- 1 to 2*n
s1 = r1.sliding(i, j).toList.map(_.toList)
s2 = r2.sliding(i, j).toList.map(_.toList)
if s1 != s2
} println(s"Sliding fail for size=$i and step=$j: [s1=$s1; s2=$s2]")
}
具体考虑r1 = 0 until 10
。根据文档,r1.sliding(size = 2, step = 1)
应该是这样的:
List(List(0, 1), List(1, 2), List(2, 3), List(3, 4), List(4, 5), List(5, 6), List(6, 7), List(7, 8), List(8, 9), List(9))
但实际上是这样的:
List(List(0, 1), List(1, 2), List(2, 3), List(3, 4), List(4, 5), List(5, 6), List(6, 7), List(7, 8), List(8, 9))
(即缺少最后一个截断的切片)。
从 Scaladoc 复制的另一个片段:
/** Returns an iterator which presents a "sliding window" view of
* another iterator. The first argument is the window size, and
* the second is how far to advance the window on each iteration;
* defaults to `1`. Example usages:
* {{{
* // Returns List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5))
* (1 to 5).iterator.sliding(3).toList
* // Returns List(List(1, 2, 3, 4), List(4, 5))
* (1 to 5).iterator.sliding(4, 3).toList
* // Returns List(List(1, 2, 3, 4))
* (1 to 5).iterator.sliding(4, 3).withPartial(false).toList
* // Returns List(List(1, 2, 3, 4), List(4, 5, 20, 25))
* // Illustrating that withPadding's argument is by-name.
* val it2 = Iterator.iterate(20)(_ + 5)
* (1 to 5).iterator.sliding(4, 3).withPadding(it2.next).toList
* }}}
*
* @note Reuse: $consumesAndProducesIterator
*/
def sliding[B >: A](size: Int, step: Int = 1): GroupedIterator[B] =
new GroupedIterator[B](self, size, step)
我做错了什么?
它对元素进行分组并在所有内容都分组后停止。
它不会在每个可能的步骤进行分组。
scala> (1 to 100).sliding(size=100,step=1).toList.size
res0: Int = 1
scala> (1 to 100).sliding(size=99,step=1).toList.size
res1: Int = 2
在您的示例中,您希望它使用 9
创建一个额外的组,即使集合已经被彻底分组。
您还展示了元素形成部分组的示例:
scala> (1 to 5).sliding(size=4,step=3).toList
res4: List[scala.collection.immutable.IndexedSeq[Int]] = List(Vector(1, 2, 3, 4), Vector(4, 5))
需要额外的组,因为 5
仍未分组。
编辑:对 Scaladoc 的可能改写:
An iterator producing ${coll}s of size size
, except the last element
(which may be the only element) will be truncated if there are fewer
than size
elements remaining to be grouped.
根据@som-snytt 的回答,我找到了一种用 slice
表达 sliding
的方法,如下所示:
override def sliding(window: Int, step: Int) = {
require(window > 0 && step > 0, s"window=$window and step=$step, but both must be positive")
val lag = (window - step) max 0
Iterator.range(start = 0, end = length - lag, step = step).map(i => slice(i, i + window))
}
这是 Scala 集合 API 的 sliding()
:
/** Groups elements in fixed size blocks by passing a "sliding window"
* over them (as opposed to partitioning them, as is done in grouped.)
* @see [[scala.collection.Iterator]], method `sliding`
*
* @param size the number of elements per group
* @param step the distance between the first elements of successive
* groups
* @return An iterator producing ${coll}s of size `size`, except the
* last and the only element will be truncated if there are
* fewer elements than size.
*/
def sliding(size: Int, step: Int): Iterator[Repr] =
一个简单的理解方式就是滑动就是(0 until this.length by step).map(i => slice(i, i + size))
。但是这种解释在 size > step
:
object SlidingTest extends App {
val n = 10
val r1 = 0 until n
val r2 = new Range(start = 0, end = n, step = 1) {
override def sliding(size: Int, step: Int) =
(indices by step).iterator.map(i => slice(i, i + size))
}
for {
i <- 1 to 2*n
j <- 1 to 2*n
s1 = r1.sliding(i, j).toList.map(_.toList)
s2 = r2.sliding(i, j).toList.map(_.toList)
if s1 != s2
} println(s"Sliding fail for size=$i and step=$j: [s1=$s1; s2=$s2]")
}
具体考虑r1 = 0 until 10
。根据文档,r1.sliding(size = 2, step = 1)
应该是这样的:
List(List(0, 1), List(1, 2), List(2, 3), List(3, 4), List(4, 5), List(5, 6), List(6, 7), List(7, 8), List(8, 9), List(9))
但实际上是这样的:
List(List(0, 1), List(1, 2), List(2, 3), List(3, 4), List(4, 5), List(5, 6), List(6, 7), List(7, 8), List(8, 9))
(即缺少最后一个截断的切片)。
从 Scaladoc 复制的另一个片段:
/** Returns an iterator which presents a "sliding window" view of
* another iterator. The first argument is the window size, and
* the second is how far to advance the window on each iteration;
* defaults to `1`. Example usages:
* {{{
* // Returns List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5))
* (1 to 5).iterator.sliding(3).toList
* // Returns List(List(1, 2, 3, 4), List(4, 5))
* (1 to 5).iterator.sliding(4, 3).toList
* // Returns List(List(1, 2, 3, 4))
* (1 to 5).iterator.sliding(4, 3).withPartial(false).toList
* // Returns List(List(1, 2, 3, 4), List(4, 5, 20, 25))
* // Illustrating that withPadding's argument is by-name.
* val it2 = Iterator.iterate(20)(_ + 5)
* (1 to 5).iterator.sliding(4, 3).withPadding(it2.next).toList
* }}}
*
* @note Reuse: $consumesAndProducesIterator
*/
def sliding[B >: A](size: Int, step: Int = 1): GroupedIterator[B] =
new GroupedIterator[B](self, size, step)
我做错了什么?
它对元素进行分组并在所有内容都分组后停止。
它不会在每个可能的步骤进行分组。
scala> (1 to 100).sliding(size=100,step=1).toList.size
res0: Int = 1
scala> (1 to 100).sliding(size=99,step=1).toList.size
res1: Int = 2
在您的示例中,您希望它使用 9
创建一个额外的组,即使集合已经被彻底分组。
您还展示了元素形成部分组的示例:
scala> (1 to 5).sliding(size=4,step=3).toList
res4: List[scala.collection.immutable.IndexedSeq[Int]] = List(Vector(1, 2, 3, 4), Vector(4, 5))
需要额外的组,因为 5
仍未分组。
编辑:对 Scaladoc 的可能改写:
An iterator producing ${coll}s of size
size
, except the last element (which may be the only element) will be truncated if there are fewer thansize
elements remaining to be grouped.
根据@som-snytt 的回答,我找到了一种用 slice
表达 sliding
的方法,如下所示:
override def sliding(window: Int, step: Int) = {
require(window > 0 && step > 0, s"window=$window and step=$step, but both must be positive")
val lag = (window - step) max 0
Iterator.range(start = 0, end = length - lag, step = step).map(i => slice(i, i + window))
}