Java 流 - 在 map() 中使用 setter

Java Streams - Using a setter inside map()

我和同事讨论过我们不应该像这里建议的解决方案那样在 stream.map() 内部使用 setter -

这个答案的评论不鼓励以这种方式使用 map,但没有给出为什么这是一个坏主意的原因。有人可以提供一个可能的场景为什么这会中断吗?

我看到一些讨论,人们谈论通过添加或删除项目来同时修改集合本身,但是使用 map 只为数据对象设置一些值是否有任何负面影响?

我认为这里的真正问题在于它只是一种不好的做法并且违反了功能的预期用途。例如,用 filter 也可以完成同样的事情。这歪曲了它的使用,也使代码混乱或充其量,不必要的冗长。

   public static void main(String[] args) {
      List<MyNumb> foo =
            IntStream.range(1, 11).mapToObj(MyNumb::new).collect(
                  Collectors.toList());
      System.out.println(foo);
      foo = foo.stream().filter(i ->
      {
         i.value *= 10;
         return true;
      }).collect(Collectors.toList());
      System.out.println(foo);

   }


    class MyNumb {
       int value;

       public MyNumb(int v) {
          value = v;
       }
       public String toString() {
          return Integer.toString(value);
       }
    }

所以回到最初的例子。根本不需要使用地图,导致以下相当丑陋的混乱。

foos = foos.stream()
            .filter(foo -> { boolean b = foo.isBlue();
                             if (b) {
                                foo.setTitle("Some value");
                             }
                             return b;})

            .collect(Collectors.toList()); 

仅举几例:

  • map() 与 setter 是 interfering (it modifies the initial data), while specs require a non-interfering function. For more details read
  • map() 和 setter 是 stateful (your logic may depend on initial value of field you're updating), while specs require a stateless 函数
  • 即使您没有干扰正在迭代的集合,setter 的 side effect 也是不必要的
  • map 中的设置者可能会误导未来的代码维护者
  • 等...

Streams 不仅仅是一组新的 APIs,它让你的事情变得更容易。它还带来了函数式编程范例。

而且,函数式编程范式最重要的方面是使用纯函数进行计算。纯函数是输出仅取决于且仅取决于其输入的函数。 所以,基本上 Streams API 应该使用无状态、无副作用和纯函数。

引用 Joshua Bloch 的 Effective Java(第 3 版)

If you’re new to streams, it can be difficult to get the hang of them. Merely expressing your computation as a stream pipeline can be hard. When you succeed, your program will run, but you may realize little if any benefit. Streams isn’t just an API, it’s a paradigm based on functional programming. In order to obtain the expressiveness, speed, and in some cases parallelizability that streams have to offer, you have to adopt the paradigm as well as the API. The most important part of the streams paradigm is to structure your compu- tation as a sequence of transformations where the result of each stage is as close as possible to a pure function of the result of the previous stage. A pure function is one whose result depends only on its input: it does not depend on any mutable state, nor does it update any state. In order to achieve this, any function objects that you pass into stream operations, both intermediate and terminal, should be free of side-effects. Occasionally, you may see streams code that looks like this snippet, which builds a frequency table of the words in a text file:

// Uses the streams API but not the paradigm--Don't do this!

Map<String, Long> freq = new HashMap<>();

try (Stream<String> words = new Scanner(file).tokens()) {
        words.forEach(word -> { freq.merge(word.toLowerCase(), 1L, Long::sum);
    });
}

What’s wrong with this code? After all, it uses streams, lambdas, and method references, and gets the right answer. Simply put, it’s not streams code at all; it’s iterative code masquerading as streams code. It derives no benefits from the streams API, and it’s (a bit) longer, harder to read, and less maintainable than the corresponding iterative code. The problem stems from the fact that this code is doing all its work in a terminal forEach operation, using a lambda that mutates external state (the frequency table). A forEach operation that does anything more than present the result of the computation performed by a stream is a “bad smell in code,” as is a lambda that mutates state. So how should this code look?

// Proper use of streams to initialize a frequency table

Map<String, Long> freq;

try (Stream<String> words = new Scanner(file).tokens()) {
    freq = words
    .collect(groupingBy(String::toLowerCase, counting()));
}

map 中使用副作用,如调用 setter,与将 peek 用于非调试目的有很多相似之处,这已在 [=19= 中讨论过]

有一个很好的一般性建议:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.

鉴于 命名相关的实际问题;我必须引用我自己:

The important thing you have to understand, is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all.

当您将具有副作用的操作放入 map 函数时,您对它将在哪些元素上执行,甚至可能如何执行有特定的期望,例如按什么顺序。预期是否会实现,取决于其他后续 Stream 操作,甚至可能取决于微妙的实现细节。

展示一些例子:

IntStream.range(0, 10) // outcome changes with Java 9
    .mapToObj(i -> System.out.append("side effect on "+i+"\n"))
    .count();
IntStream.range(0, 2) // outcome changes with Java 10 (or 8u222)
    .flatMap(i -> IntStream.range(i * 5, (i+1) * 5 ))
    .map(i -> { System.out.println("side effect on "+i); return i; })
    .anyMatch(i -> i > 3);
IntStream.range(0, 10) // outcome may change with every run
    .parallel()
    .map(i -> { System.out.println("side effect on "+i); return i; })
    .anyMatch(i -> i > 6);

此外,正如链接答案中已经提到的,即使您有一个处理所有元素并按顺序排列的终端操作,也无法保证中间操作的处理顺序(或并行流的并发性)。

当你有一个没有重复的流和一个处理所有元素的终端操作和一个只调用一个微不足道的 setter 的 map 函数时,代码可能会做你想要的事情,但是该代码对微妙的周围条件有如此多的依赖性,以至于它将成为维护的噩梦。这让我们回到关于以意外方式使用 API 的第一句话。