Java 流的迭代器强制 flatmap 在获取第一项之前遍历子流
Java stream's iterator forces flatmap to traverse substream before getting the first item
我需要从流中创建一个迭代器。父流和子流都是由互不干扰的无状态操作组成的,显而易见的策略是使用flatMap。
原来迭代器在第一次调用“hasNext”时遍历了整个第一个子流,我不明白为什么。尽管 iterator()
是一个终端操作,但明确指出它不应该消耗流。
我需要子流生成的对象一个一个生成。
为了复制行为,我用显示相同的示例模拟了我的真实代码:
import java.util.Iterator;
import java.util.Objects;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Stream;
public class FreeRunner {
public static void main(String[] args) {
AtomicInteger x = new AtomicInteger();
Iterator<C> iterator = Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(n -> new A("A" + x.get()))
.flatMap(A::getBStream)
.filter(Objects::nonNull)
.map(B::toC)
.iterator();
while(iterator.hasNext()) {
System.out.println("after hasNext()");
C next = iterator.next();
System.out.println(next);
}
}
private static class A {
private final String name;
public A(String name) {
this.name = name;
System.out.println(" > created " + name);
}
public Stream<B> getBStream() {
AtomicInteger c = new AtomicInteger();
return Stream.generate(() -> null)
.takeWhile(x -> c.incrementAndGet() < 5)
.map(n -> c.get() % 2 == 0 ? null : new B(this.name + "->B" + c.get()));
}
public String toString() {
return name;
}
}
private static class B {
private final String name;
public B(String name) {
this.name = name;
System.out.println(" >> created " + name);
}
public String toString() {
return name;
}
public C toC() {
return new C(this.name + "+C");
}
}
private static class C {
private final String name;
public C(String name) {
this.name = name;
System.out.println(" >>> created " + name);
}
public String toString() {
return name;
}
}
}
执行时显示:
> created A2
>> created A2->B1
>>> created A2->B1+C
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B1+C
after hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B1+C
after hasNext()
A4->B3+C
Process finished with exit code 0
在调试中很明显 iterator.hasNext()
触发了对象 B 和 C 的生成。
相反,所需的行为是:
> created A2
>> created A2->B1
>>> created A2->B1+C
after hasNext()
A2->B1+C
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
after hasNext()
A4->B1+C
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B3+C
我在这里错过了什么?
这是我能想到的最接近您想要的行为方式。我把它放在这里是为了帮助讨论。在您的示例中,您有两个标识符;一个在创建 A 对象时使用,另一个在创建 B 对象时使用。使用此代码,这些标识符是使用与您相同的逻辑预先创建的(尽管我将 AtomicInteger
替换为 IntStream
)。 flatmap
仍在使用,但在创建对象时不再使用。
import java.util.Iterator;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class FreeRunner3 {
public static void main(String[] args) throws InterruptedException {
Iterator<C> iterator = IntStream.range(1, 5)
.filter(i -> i % 2 == 0)
.boxed()
.flatMap(i -> IntStream.range(1, 5)
.filter(j -> j % 2 != 0)
.mapToObj(j -> new int[] { i, j }))
.collect(Collectors.toList())
.stream()
.map(id -> new A("A" + id[0]).toB(id[1]))
.map(B::toC)
.iterator();
while (iterator.hasNext()) {
System.out.println("after hasNext()");
C next = iterator.next();
System.out.println(next);
}
}
private static class A {
private final String name;
public A(String name) {
this.name = name;
System.out.println(" > created " + name);
}
public B toB(int i) {
return new B(this.name + "->B" + i);
}
public String toString() {
return name;
}
}
private static class B {
private final String name;
public B(String name) {
this.name = name;
System.out.println(" >> created " + name);
}
public String toString() {
return name;
}
public C toC() {
return new C(this.name + "+C");
}
}
private static class C {
private final String name;
public C(String name) {
this.name = name;
System.out.println(" >>> created " + name);
}
public String toString() {
return name;
}
}
}
此实现的一个区别是为每个 B 实例创建一个 A 实例。
> created A2
>> created A2->B1
>>> created A2->B1+C
after hasNext()
A2->B1+C
> created A2
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
after hasNext()
A4->B1+C
> created A4
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B3+C
我想我也会包括这个以帮助讨论(希望没有人投反对票)。它与您想要的相反,在调用迭代器之前急切地预先创建所有对象。
Iterator<C> iterator = IntStream.range(1, 5)
.filter(i -> i % 2 == 0)
.mapToObj(i -> new A("A" + i))
.flatMap(A::getBStream)
.map(B::toC)
.collect(Collectors.toList())
.iterator();
while (iterator.hasNext()) {
System.out.println("after hasNext()");
C next = iterator.next();
System.out.println(next);
}
我找到了出路,但我不得不牺牲主流的懒惰。正如我在上面的评论中发布的那样,我试图简化模拟代码的问题即将通过 sheet 读取 excel 文件 sheet(按 sheet 的名称过滤) 并遍历所有行以根据 spreadsheet.
中的数据创建对象
最初的想法对我来说仍然不错,但显然,Stream.iterator()
实现在创建第一个 A
对象时操作的第一个 hasNext()
调用中消耗每个嵌套流。
所以我放弃了flatMap()
并使用reduce(Stream::concat)
连接了A.getBStream()
产生的所有流:
public static void main(String[] args) {
AtomicInteger x = new AtomicInteger();
Iterator<C> it = Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(a -> new A("A" + x.get()))
.map(A::getBStream)
.filter(Objects::nonNull)
.reduce(Stream::concat)
.orElseGet(Stream::empty)
.filter(Objects::nonNull)
.map(B::toC)
.iterator();
while(it.hasNext()) {
System.out.println("after hasNext()");
C next = it.next();
System.out.println(next);
}
}
这会产生以下输出:
> created A2
> created A4
>> created A2->B0
>>> created A2->B0+C
after hasNext()
A2->B0+C
>> created A2->B1
>>> created A2->B1+C
after hasNext()
A2->B1+C
>> created A2->B2
>>> created A2->B2+C
after hasNext()
A2->B2+C
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B3+C
>> created A2->B4
>> created A4->B0
>>> created A4->B0+C
after hasNext()
A4->B0+C
>> created A4->B1
>>> created A4->B1+C
after hasNext()
A4->B1+C
>> created A4->B2
>>> created A4->B2+C
after hasNext()
A4->B2+C
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B3+C
>> created A4->B4
付出的代价是预先生成 A2
和 A4
,但所有 B
对象都是延迟生成的
无需创建迭代器然后为其中的每个 元素执行某些操作,您可以简单地使用forEach
并直接在Stream 中执行操作:
AtomicInteger x = new AtomicInteger();
Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(n -> new A("A" + x.get()))
.flatMap(A::getBStream)
.filter(Objects::nonNull)
.map(B::toC)
.forEach(c -> {
System.out.println("after (no) hasNext()");
System.out.println(c);
});
输出:
> created A2
>> created A2->B1
>>> created A2->B1+C
after (no) hasNext()
A2->B1+C
>> created A2->B3
>>> created A2->B3+C
after (no) hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
after (no) hasNext()
A4->B1+C
>> created A4->B3
>>> created A4->B3+C
after (no) hasNext()
A4->B3+C
如果您需要 return 列表,您可以简单地 return 一个流:
AtomicInteger x = new AtomicInteger();
Stream<C> stream = Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(n -> new A("A" + x.get()))
.flatMap(A::getBStream)
.filter(Objects::nonNull)
.map(B::toC);
// return it here
stream.forEach(c -> {
System.out.println("after (no) hasNext()");
System.out.println(c);
});
输出是一样的。
我需要从流中创建一个迭代器。父流和子流都是由互不干扰的无状态操作组成的,显而易见的策略是使用flatMap。
原来迭代器在第一次调用“hasNext”时遍历了整个第一个子流,我不明白为什么。尽管 iterator()
是一个终端操作,但明确指出它不应该消耗流。
我需要子流生成的对象一个一个生成。
为了复制行为,我用显示相同的示例模拟了我的真实代码:
import java.util.Iterator;
import java.util.Objects;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Stream;
public class FreeRunner {
public static void main(String[] args) {
AtomicInteger x = new AtomicInteger();
Iterator<C> iterator = Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(n -> new A("A" + x.get()))
.flatMap(A::getBStream)
.filter(Objects::nonNull)
.map(B::toC)
.iterator();
while(iterator.hasNext()) {
System.out.println("after hasNext()");
C next = iterator.next();
System.out.println(next);
}
}
private static class A {
private final String name;
public A(String name) {
this.name = name;
System.out.println(" > created " + name);
}
public Stream<B> getBStream() {
AtomicInteger c = new AtomicInteger();
return Stream.generate(() -> null)
.takeWhile(x -> c.incrementAndGet() < 5)
.map(n -> c.get() % 2 == 0 ? null : new B(this.name + "->B" + c.get()));
}
public String toString() {
return name;
}
}
private static class B {
private final String name;
public B(String name) {
this.name = name;
System.out.println(" >> created " + name);
}
public String toString() {
return name;
}
public C toC() {
return new C(this.name + "+C");
}
}
private static class C {
private final String name;
public C(String name) {
this.name = name;
System.out.println(" >>> created " + name);
}
public String toString() {
return name;
}
}
}
执行时显示:
> created A2
>> created A2->B1
>>> created A2->B1+C
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B1+C
after hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B1+C
after hasNext()
A4->B3+C
Process finished with exit code 0
在调试中很明显 iterator.hasNext()
触发了对象 B 和 C 的生成。
相反,所需的行为是:
> created A2
>> created A2->B1
>>> created A2->B1+C
after hasNext()
A2->B1+C
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
after hasNext()
A4->B1+C
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B3+C
我在这里错过了什么?
这是我能想到的最接近您想要的行为方式。我把它放在这里是为了帮助讨论。在您的示例中,您有两个标识符;一个在创建 A 对象时使用,另一个在创建 B 对象时使用。使用此代码,这些标识符是使用与您相同的逻辑预先创建的(尽管我将 AtomicInteger
替换为 IntStream
)。 flatmap
仍在使用,但在创建对象时不再使用。
import java.util.Iterator;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class FreeRunner3 {
public static void main(String[] args) throws InterruptedException {
Iterator<C> iterator = IntStream.range(1, 5)
.filter(i -> i % 2 == 0)
.boxed()
.flatMap(i -> IntStream.range(1, 5)
.filter(j -> j % 2 != 0)
.mapToObj(j -> new int[] { i, j }))
.collect(Collectors.toList())
.stream()
.map(id -> new A("A" + id[0]).toB(id[1]))
.map(B::toC)
.iterator();
while (iterator.hasNext()) {
System.out.println("after hasNext()");
C next = iterator.next();
System.out.println(next);
}
}
private static class A {
private final String name;
public A(String name) {
this.name = name;
System.out.println(" > created " + name);
}
public B toB(int i) {
return new B(this.name + "->B" + i);
}
public String toString() {
return name;
}
}
private static class B {
private final String name;
public B(String name) {
this.name = name;
System.out.println(" >> created " + name);
}
public String toString() {
return name;
}
public C toC() {
return new C(this.name + "+C");
}
}
private static class C {
private final String name;
public C(String name) {
this.name = name;
System.out.println(" >>> created " + name);
}
public String toString() {
return name;
}
}
}
此实现的一个区别是为每个 B 实例创建一个 A 实例。
> created A2
>> created A2->B1
>>> created A2->B1+C
after hasNext()
A2->B1+C
> created A2
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
after hasNext()
A4->B1+C
> created A4
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B3+C
我想我也会包括这个以帮助讨论(希望没有人投反对票)。它与您想要的相反,在调用迭代器之前急切地预先创建所有对象。
Iterator<C> iterator = IntStream.range(1, 5)
.filter(i -> i % 2 == 0)
.mapToObj(i -> new A("A" + i))
.flatMap(A::getBStream)
.map(B::toC)
.collect(Collectors.toList())
.iterator();
while (iterator.hasNext()) {
System.out.println("after hasNext()");
C next = iterator.next();
System.out.println(next);
}
我找到了出路,但我不得不牺牲主流的懒惰。正如我在上面的评论中发布的那样,我试图简化模拟代码的问题即将通过 sheet 读取 excel 文件 sheet(按 sheet 的名称过滤) 并遍历所有行以根据 spreadsheet.
中的数据创建对象最初的想法对我来说仍然不错,但显然,Stream.iterator()
实现在创建第一个 A
对象时操作的第一个 hasNext()
调用中消耗每个嵌套流。
所以我放弃了flatMap()
并使用reduce(Stream::concat)
连接了A.getBStream()
产生的所有流:
public static void main(String[] args) {
AtomicInteger x = new AtomicInteger();
Iterator<C> it = Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(a -> new A("A" + x.get()))
.map(A::getBStream)
.filter(Objects::nonNull)
.reduce(Stream::concat)
.orElseGet(Stream::empty)
.filter(Objects::nonNull)
.map(B::toC)
.iterator();
while(it.hasNext()) {
System.out.println("after hasNext()");
C next = it.next();
System.out.println(next);
}
}
这会产生以下输出:
> created A2
> created A4
>> created A2->B0
>>> created A2->B0+C
after hasNext()
A2->B0+C
>> created A2->B1
>>> created A2->B1+C
after hasNext()
A2->B1+C
>> created A2->B2
>>> created A2->B2+C
after hasNext()
A2->B2+C
>> created A2->B3
>>> created A2->B3+C
after hasNext()
A2->B3+C
>> created A2->B4
>> created A4->B0
>>> created A4->B0+C
after hasNext()
A4->B0+C
>> created A4->B1
>>> created A4->B1+C
after hasNext()
A4->B1+C
>> created A4->B2
>>> created A4->B2+C
after hasNext()
A4->B2+C
>> created A4->B3
>>> created A4->B3+C
after hasNext()
A4->B3+C
>> created A4->B4
付出的代价是预先生成 A2
和 A4
,但所有 B
对象都是延迟生成的
无需创建迭代器然后为其中的每个 元素执行某些操作,您可以简单地使用forEach
并直接在Stream 中执行操作:
AtomicInteger x = new AtomicInteger();
Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(n -> new A("A" + x.get()))
.flatMap(A::getBStream)
.filter(Objects::nonNull)
.map(B::toC)
.forEach(c -> {
System.out.println("after (no) hasNext()");
System.out.println(c);
});
输出:
> created A2
>> created A2->B1
>>> created A2->B1+C
after (no) hasNext()
A2->B1+C
>> created A2->B3
>>> created A2->B3+C
after (no) hasNext()
A2->B3+C
> created A4
>> created A4->B1
>>> created A4->B1+C
after (no) hasNext()
A4->B1+C
>> created A4->B3
>>> created A4->B3+C
after (no) hasNext()
A4->B3+C
如果您需要 return 列表,您可以简单地 return 一个流:
AtomicInteger x = new AtomicInteger();
Stream<C> stream = Stream.generate(() -> null)
.takeWhile(y -> x.incrementAndGet() < 5)
.filter(y -> x.get() % 2 == 0)
.map(n -> new A("A" + x.get()))
.flatMap(A::getBStream)
.filter(Objects::nonNull)
.map(B::toC);
// return it here
stream.forEach(c -> {
System.out.println("after (no) hasNext()");
System.out.println(c);
});
输出是一样的。