从流中收集对

Collect pairs from a stream

我有这样的对象流:

"0", "1", "2", "3", "4", "5",

如何将其转换为成对流:

{ new Pair("0", "1"), new Pair("2", "3"), new Pair("4", "5")}.

流大小未知。我正在从一个可能很大的文件中读取数据。我只有要收集的迭代器,我使用 spliterator 将此迭代器转换为流。 我知道这是使用 StreamEx 处理相邻对的答案: Collect successive pairs from a stream 这可以在 java 或 StreamEx 中完成吗? 谢谢

这不是天作之合,但你可以做到

List input = ...
List<Pair> pairs = IntStream.range(0, input.size() / 2)
                            .map(i -> i * 2)
                            .mapToObj(i -> new Pair(input.get(i), input.get(i + 1)))
                            .collect(Collectors.toList());

要在进入流时创建对,您需要有状态的 lambda,这通常应该避免,但可以做到。注意:这仅在流是单线程时才有效。即不平行。

Stream<?> stream = 
assert !stream.isParallel();
Object[] last = { null };
List<Pair> pairs = stream.map(a -> {
        if (last[0] == null) {
            last[0] = a;
            return null;
        } else {
            Object t = last[0];
            last[0] = null;
            return new Pair(t, a);
        }
     }).filter(p -> p != null)
       .collect(Collectors.toList());
assert last[0] == null; // to check for an even number input.

只需将 IntStream.range(1, 101) 替换为您的流(您不需要知道流的大小)-

import java.util.ArrayList;
import java.util.List;
import java.util.stream.IntStream;

public class TestClass {

    public static void main(String[] args) {

        final Pair pair = new Pair();
        final List<Pair> pairList = new ArrayList<>();

        IntStream.range(1, 101)
                .map(i -> {
                    if (pair.a == null) {
                        pair.a = i;
                        return 0;
                    } else {
                        pair.b = i;
                        return 1;
                    }
                })
                .filter(i -> i == 1)
                .forEach(i -> {
                    pairList.add(new Pair(pair));
                    pair.reset();
                });

        pairList.stream().forEach(p -> System.out.print(p + " "));
    }

    static class Pair {
        public Object a;
        public Object b;

        public Pair() {
        }

        public Pair(Pair orig) {
            this.a = orig.a;
            this.b = orig.b;
        }

        void reset() {
            a = null;
            b = null;
        }

        @Override
        public String toString() {
            return "{" + a + "," + b + '}';
        }
    }

}

如果不想收集元素

问题的标题说 从流中收集 对,所以我假设你想实际收集这些,但你评论说:

Your solution works, the problem is that it loads the data from file to PairList and then I may use stream from this collection to process pairs. I can't do it because the data might be too big to store in the memory.

所以这是一种无需收集元素即可执行此操作的方法。

Iterator 转换为 Iterator> 并从中转换为相对简单流变成成对的流。

  /**
   * Returns an iterator over pairs of elements returned by the iterator.
   * 
   * @param iterator the base iterator
   * @return the paired iterator
   */
  public static <T> Iterator<List<T>> paired(Iterator<T> iterator) {
    return new Iterator<List<T>>() {
      @Override
      public boolean hasNext() {
        return iterator.hasNext();
      }

      @Override
      public List<T> next() {
        T first = iterator.next();
        if (iterator.hasNext()) {
          return Arrays.asList(first, iterator.next());
        } else {
          return Arrays.asList(first);
        }
      }
    };
  }

  /**
   * Returns an stream of pairs of elements from a stream.
   * 
   * @param stream the base stream
   * @return the pair stream
   */
  public static <T> Stream<List<T>> paired(Stream<T> stream) {
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(paired(stream.iterator()), Spliterator.ORDERED),
        false);
  }

  @Test
  public void iteratorAndStreamsExample() {
    List<String> strings = Arrays.asList("a", "b", "c", "d", "e", "f");
    Iterator<List<String>> pairs = paired(strings.iterator());
    while (pairs.hasNext()) {
      System.out.println(pairs.next());
      // [a, b]
      // [c, d]
      // [e, f]
    }

    paired(Stream.of(1, 2, 3, 4, 5, 6, 7, 8)).forEach(System.out::println);
    // [1, 2]
    // [3, 4]
    // [5, 6]
    // [7, 8]
  }

如果你想收集元素...

我会通过收集到列表中并使用 AbstractList 来提供元素成对的 view 来做到这一点。

首先,PairList。这是一个围绕 any 列表的简单 AbstractList 包装器,它具有偶数个元素。 (一旦指定了所需的行为,这可以很容易地适应处理奇数长度列表。)

  /**
   * A view on a list of its elements as pairs.
   * 
   * @param <T> the element type
   */
  static class PairList<T> extends AbstractList<List<T>> {
    private final List<T> elements;

    /**
     * Creates a new pair list.
     * 
     * @param elements the elements
     * 
     * @throws NullPointerException if elements is null
     * @throws IllegalArgumentException if the length of elements is not even
     */
    public PairList(List<T> elements) {
      Objects.requireNonNull(elements, "elements must not be null");
      this.elements = new ArrayList<>(elements);
      if (this.elements.size() % 2 != 0) {
        throw new IllegalArgumentException("number of elements must have even size");
      }
    }

    @Override
    public List<T> get(int index) {
      return Arrays.asList(elements.get(index), elements.get(index + 1));
    }

    @Override
    public int size() {
      return elements.size() / 2;
    }
  }

然后我们就可以定义我们需要的收集器了。这本质上是 shorthand for collectingAndThen(toList(), PairList::new):

  /**
   * Returns a collector that collects to a pair list.
   * 
   * @return the collector
   */
  public static <E> Collector<E, ?, PairList<E>> toPairList() {
    return Collectors.collectingAndThen(Collectors.toList(), PairList::new);
  }

请注意,可能定义一个不防御性复制列表的 PairList 构造函数是值得的,因为我们知道支持列表是新生成的用例(如这个案例)。不过,现在这并不是真正必要的。但是一旦我们这样做了,这个方法就是 collectingAndThen(toCollection(ArrayList::new), PairList::newNonDefensivelyCopiedPairList).

现在我们可以使用它了:

  /**
   * Creates a pair list with collectingAndThen, toList(), and PairList::new
   */
  @Test
  public void example() {
    List<List<Integer>> intPairs = Stream.of(1, 2, 3, 4, 5, 6)
        .collect(toPairList());
    System.out.println(intPairs); // [[1, 2], [2, 3], [3, 4]]

    List<List<String>> stringPairs = Stream.of("a", "b", "c", "d")
        .collect(toPairList());
    System.out.println(stringPairs); // [[a, b], [b, c]]
  }

这是一个带有可运行示例的完整源文件(作为 JUnit 测试):

package ex;

import java.util.AbstractList;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collector;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.junit.Test;

public class PairCollectors {

  /**
   * A view on a list of its elements as pairs.
   * 
   * @param <T> the element type
   */
  static class PairList<T> extends AbstractList<List<T>> {
    private final List<T> elements;

    /**
     * Creates a new pair list.
     * 
     * @param elements the elements
     * 
     * @throws NullPointerException if elements is null
     * @throws IllegalArgumentException if the length of elements is not even
     */
    public PairList(List<T> elements) {
      Objects.requireNonNull(elements, "elements must not be null");
      this.elements = new ArrayList<>(elements);
      if (this.elements.size() % 2 != 0) {
        throw new IllegalArgumentException("number of elements must have even size");
      }
    }

    @Override
    public List<T> get(int index) {
      return Arrays.asList(elements.get(index), elements.get(index + 1));
    }

    @Override
    public int size() {
      return elements.size() / 2;
    }
  }

  /**
   * Returns a collector that collects to a pair list.
   * 
   * @return the collector
   */
  public static <E> Collector<E, ?, PairList<E>> toPairList() {
    return Collectors.collectingAndThen(Collectors.toList(), PairList::new);
  }

  /**
   * Creates a pair list with collectingAndThen, toList(), and PairList::new
   */
  @Test
  public void example() {
    List<List<Integer>> intPairs = Stream.of(1, 2, 3, 4, 5, 6)
        .collect(toPairList());
    System.out.println(intPairs); // [[1, 2], [2, 3], [3, 4]]

    List<List<String>> stringPairs = Stream.of("a", "b", "c", "d")
        .collect(toPairList());
    System.out.println(stringPairs); // [[a, b], [b, c]]
  }    
}

假设有一个 Pairleftright 和 getter 以及一个构造函数:

 static class Paired<T> extends AbstractSpliterator<Pair<T>> {

    private List<T> list = new ArrayList<>(2);

    private final Iterator<T> iter;

    public Paired(Iterator<T> iter) {
        super(Long.MAX_VALUE, 0);
        this.iter = iter;
    }

    @Override
    public boolean tryAdvance(Consumer<? super Pair<T>> consumer) {
        getBothIfPossible(iter);
        if (list.size() == 2) {
            consumer.accept(new Pair<>(list.remove(0), list.remove(0)));
            return true;
        }
        return false;
    }

    private void getBothIfPossible(Iterator<T> iter) {
        while (iter.hasNext() && list.size() < 2) {
            list.add(iter.next());
        }
    }

}

用法为:

 Iterator<Integer> iterator = List.of(1, 2, 3, 4, 5).iterator();
 Paired<Integer> p = new Paired<>(iterator);
 StreamSupport.stream(p, false)
            .forEach(pair -> System.out.println(pair.getLeft() + "  " + pair.getRight()));

我知道我迟到了,但是所有的答案似乎都非常复杂或者有很多 GC overhead/short-lived 对象(这对现代 JVM 来说不是什么大问题),但是为什么不简单地这样做呢?

public class PairCollaterTest extends TestCase {
    static class PairCollater<T> implements Function<T, Stream<Pair<T, T>>> {
        T prev;

        @Override
        public Stream<Pair<T, T>> apply(T curr) {
            if (prev == null) {
                prev = curr;
                return Stream.empty();
            }
            try {
                return Stream.of(Pair.of(prev, curr));
            } finally {
                prev = null;
            }
        }
    }

    public void testPairCollater() {
        Stream.of("0", "1", "2", "3", "4", "5").sequential().flatMap(new PairCollater<>()).forEach(System.out::println);
    }
}

打印:

(0,1)
(2,3)
(4,5)