Stream API - 如果 filter() 紧随其后,sorted() 操作如何工作?

Stream API - How does sorted() operation works if a filter() is placed right after it?

采用以下代码,该代码对列表进行排序,然后对其进行过滤:

public static void main(String[] args) {
        List<Integer> list = List.of(3,2,1);
        
        List<Integer> filtered =  list.stream()
                .sorted() // Does sorted() sort the entire array first ? Then pass the entire sorted output to filter ?
                .filter(x -> x < 3)
                .collect(Collectors.toList());
        
        System.out.println(filtered);
    }

整个 sort() 是否先发生,然后传递给 filter()

那么这不是违反了流应该做的事情吗?

我的意思是,他们应该一次处理 一个元素

Does the entire sort() happen first then gets passed to filter() ?

Then isn't that a violation of what streams are suppose to do ?

不,不是。看看 documentation of the Stream IPA:

Intermediate operations are further divided into stateless and stateful operations. Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.

Stateful operations may need to process the entire input before producing a result. For example, one cannot produce any results from sorting a stream until one has seen all elements of the stream. As a result, under parallel computation, some pipelines containing stateful intermediate operations may require multiple passes on the data or may need to buffer significant data. Pipelines containing exclusively stateless intermediate operations can be processed in a single pass, whether sequential or parallel, with minimal data buffering.

这意味着 sorted 知道所有以前遇到的元素,即它是 有状态的。但是mapfilter不需要这些信息,它们是statelesslazy,这些操作总是处理元素从流源一次一个。

从技术上讲,通过单独查看单个元素来对管道的内容进行排序是不可能的。 sorted “一次”对 所有 元素进行操作,并将排序后的流分发给下一个操作。您可能会认为 sorted 好像它成为流的新来源。

我们来看看下面的流,分析一下它是如何处理的:

Stream.of("foo", "bar", "Alice", "Bob", "Carol")
    .filter(str -> !str.contains("r")) // lazy processing
    .peek(System.out::println)
    .map(String::toUpperCase)          // lazy processing
    .peek(System.out::println)
    .sorted()                          // <--- all data is being dumped into memory
    .peek(System.out::println)
    .filter(str -> str.length() > 3)   // lazy processing
    .peek(System.out::println)
    .findFirst();                      // <--- the terminal operation

filtermap 之前的操作 sorted 将延迟应用于来自流源的每个元素,并且仅在需要时才应用。 IE。 filter 将应用于 "foo",它成功通过 filter 并由 map[= 进行转换67=] 操作。然后 filter 应用在 "bar" 上,它不会到达 map。然后轮到 "Alice" 传递 filter,然后 map 将在该字符串上执行。等等。

请记住,sorted() 需要所有数据才能完成其工作,因此 第一个过滤器 将针对源中的所有元素执行,而 map 将应用于每个通过过滤器的元素。

然后sorted()操作会将流的所有内容转储到内存中,并对通过第一个过滤器的元素进行排序。

并且在 排序 之后,所有元素将再次被处理 一次一个 。因此,第二个过滤器将只应用一次(尽管3元素已经通过第一个过滤器并被排序)。 "Alice" 将通过 第二个过滤器 并到达将 return 此字符串的终端操作 findFirst()

查看 peek() make 的调试输出,执行过程如上所述。