Kotlin 的 Iterable 和 Sequence 看起来一模一样。为什么需要两种类型?

Kotlin's Iterable and Sequence look exactly same. Why are two types required?

这两个接口只定义了一个方法

public operator fun iterator(): Iterator<T>

文档说 Sequence 是懒惰的。但是 Iterable 是不是也很懒惰(除非有 Collection 支持)?

主要区别在于 Iterable<T>Sequence<T> 的语义和 stdlib 扩展函数的实现。

  • 对于 Sequence<T>,扩展函数尽可能延迟执行,类似于 Java Streams 中间 操作。例如,Sequence<T>.map { ... } return 是另一个 Sequence<R> 并且在 终端 操作(如 toListfold 被调用。

    考虑这段代码:

    val seq = sequenceOf(1, 2)
    val seqMapped: Sequence<Int> = seq.map { print("$it "); it * it } // intermediate
    print("before sum ")
    val sum = seqMapped.sum() // terminal
    

    它打印:

    before sum 1 2
    

    Sequence<T> 用于懒惰使用和高效流水线,当你想尽可能减少在 terminal 操作中完成的工作时,与 [=70= 相同] 溪流。然而,懒惰会带来一些开销,这对于较小集合的常见简单转换来说是不可取的,并且会降低它们的性能。

    一般来说,没有好的方法来确定什么时候需要它,所以在 Kotlin stdlib laziness 中被显式化并提取到 Sequence<T> 接口以避免在所有 Iterable 上使用它默认为s。

  • 对于Iterable<T>,相反,具有中级操作语义的扩展函数急于工作,立即处理项目并且return 另一个 Iterable。例如,Iterable<T>.map { ... } return 是一个 List<R>,其中包含映射结果。

    Iterable 的等效代码:

    val lst = listOf(1, 2)
    val lstMapped: List<Int> = lst.map { print("$it "); it * it }
    print("before sum ")
    val sum = lstMapped.sum()
    

    打印出来:

    1 2 before sum
    

    如上所述,Iterable<T> 默认情况下是非惰性的,这个解决方案表现得很好:在大多数情况下它有很好的 locality of reference 从而利用 CPU 缓存,预测、预取等,这样即使对一个集合进行多次复制仍然可以很好地工作,并且在小集合的简单情况下表现更好。

    如果您需要对评估管道进行更多控制,可以使用 Iterable<T>.asSequence() 函数显式转换为惰性序列。

完成热键的回答:

注意 Sequence 和 Iterable 如何遍历您的元素很重要:

序列示例:

list.asSequence().filter { field ->
    Log.d("Filter", "filter")
    field.value > 0
}.map {
    Log.d("Map", "Map")
}.forEach {
    Log.d("Each", "Each")
}

记录结果:

过滤器-地图-每个;过滤器 - 地图 - 每个

可迭代示例:

list.filter { field ->
    Log.d("Filter", "filter")
    field.value > 0
}.map {
    Log.d("Map", "Map")
}.forEach {
    Log.d("Each", "Each")
}

filter - filter - Map - Map - Each - Each

Iterable is mapped to the java.lang.Iterable interface on the JVM, and is implemented by commonly used collections, like List or Set. The collection extension functions on these are evaluated eagerly, which means they all immediately process all elements in their input and return a new collection containing the result.

Here’s a simple example of using the collection functions to get the names of the first five people in a list whose age is at least 21:

val people: List<Person> = getPeople()
val allowedEntrance = people
    .filter { it.age >= 21 }
    .map { it.name }
    .take(5)

Target platform: JVMRunning on kotlin v. 1.3.61 First, the age check is done for every single Person in the list, with the result put in a brand new list. Then, the mapping to their names is done for every Person who remained after the filter operator, ending up in yet another new list (this is now a List<String>). Finally, there’s one last new list created to contain the first five elements of the previous list.

In contrast, Sequence is a new concept in Kotlin to represent a lazily evaluated collection of values. The same collection extensions are available for the Sequence interface, but these immediately return Sequence instances that represent a processed state of the date, but without actually processing any elements. To start processing, the Sequence has to be terminated with a terminal operator, these are basically a request to the Sequence to materialize the data it represents in some concrete form. Examples include toList, toSet, and sum, to mention just a few. When these are called, only the minimum required number of elements will be processed to produce the demanded result.

Transforming an existing collection to a Sequence is pretty straightfoward, you just need to use the asSequence extension. As mentioned above, you also need to add a terminal operator, otherwise the Sequence will never do any processing (again, lazy!).

val people: List<Person> = getPeople()
val allowedEntrance = people.asSequence()
    .filter { it.age >= 21 }
    .map { it.name }
    .take(5)
    .toList()

Target platform: JVMRunning on kotlin v. 1.3.61 In this case, the Person instances in the Sequence are each checked for their age, if they pass, they have their name extracted, and then added to the result list. This is repeated for each person in the original list until there are five people found. At this point, the toList function returns a list, and the rest of the people in the Sequence are not processed.

There’s also something extra a Sequence is capable of: it can contain an infinite number of items. With this in perspective, it makes sense that operators work the way they do - an operator on an infinite sequence could never return if it did its work eagerly.

As an example, here’s a sequence that will generate as many powers of 2 as required by its terminal operator (ignoring the fact that this would quickly overflow):

generateSequence(1) { n -> n * 2 }
    .take(20)
    .forEach(::println)

您可以找到更多 here

Iterable 对于大多数用例来说已经足够好了,由于空间局部性,对它们执行迭代的方式非常适合缓存。 但他们的问题是整个集合必须经过第一个中间操作才能移动到第二个,依此类推。

sequence 中,每个项目在处理下一个项目之前都经过完整的管道。

这 属性 可能会影响代码的性能,尤其是在遍历大型数据集时。因此,如果您的终端操作很可能提前终止,那么 sequence 应该是首选,因为您可以通过不执行不必要的操作来节省时间。例如

sequence.filter { getFilterPredicate() }   
        .map    { getTransformation()  } 
        .first  { getSelector() }

在上述情况下,如果第一项满足 filter 谓词并且在 map 转换后满足选择条件,则 filtermapfirst 是只调用一次。

在可迭代的情况下,必须首先过滤整个集合然后映射,然后开始第一次选择