使用流收集器对对象进行分组

Question

我有对象列表，比方说 class 文档：

class Document {

    private final String id;
    private final int length;

    public Document(String id, int length) {
        this.id = id;
        this.length = length;
    }

    public int getLength() {
        return length;
    }
}

手头的任务是将它们分组在信封中，使页数 (Document.length) 不超过一定数量。

class Envelope {

    private final List<Document> documents = new ArrayList<>();
}

例如，如果我有以下文件清单：

Document doc0 = new Document("doc0", 2);
Document doc1 = new Document("doc1", 5);
Document doc2 = new Document("doc2", 5);
Document doc3 = new Document("doc3", 5);

和信封中的最大页数假设为 7，比我预期的 3 个信封和以下文件要多：

Assert.assertEquals(3, envelopeList.size());

Assert.assertEquals(2, envelopeList.get(0).getDocuments().size()); // doc0, doc1
Assert.assertEquals(1, envelopeList.get(1).getDocuments().size()); // doc2
Assert.assertEquals(1, envelopeList.get(2).getDocuments().size()); // doc3

我已经用传统的 for 循环和一堆 if 实现了这个，但问题是，是否可以用流和收集器以这种更优雅的方式来实现？

谢谢你和最好的问候

达利波

Answer 1

为了根据长度对文档进行批处理，我们需要维护累积长度的状态。 Streams 当需要维护外部状态并且自定义循环应该是更简单高效的选项时，这不是最佳选择。

如果我们强制适合，流式传输此场景，DocumentSpliterator 将更改如下：

public static List<Couvert> splitDocuments(List<Document> docs) {

    IntUnaryOperator helper = new IntUnaryOperator() {
        private int bucketIndex = 0;
        private int accumulated = 0;

        public synchronized int applyAsInt(int length) {
            if (length + accumulated > MAX) {
                bucketIndex++;
                accumulated = 0;
            }
            accumulated += length;
            return bucketIndex;
        }
    };

    return new ArrayList<>(docs.stream()
                               .map(d -> new AbstractMap.SimpleEntry<>(helper.applyAsInt(d.getLength()), d))
                               .collect(Collectors.groupingBy(AbstractMap.SimpleEntry::getKey,
                                       Collector.of(Couvert::new,
                                               (c, e) -> c.getDocuments().add(e.getValue()),
                                               (c1, c2) -> {c1.getDocuments().addAll(c2.getDocuments());return c1;})))
                               .values());
}

解释：

helper保持累计长度，超过max时提供新的bucket index。我在这里使用了 IntUnaryOperator 界面。或者，我们可以使用任何接受 int 参数和 returns 一个 int.
关于流，
- Document 映射到 SimpleEntry 的 bucketIndex 和 Document。
- SimpleEntry 的流首先根据 bucketIndex 进行分组。另一个 Collector 将特定 bucketIndex 的 Document 流转换为 Couvert。 collect() 的输出是 Map<Integer,Couvert>
最后将Couvert的Collection转换为列表返回

注意：对于此实现，我删除了 front 参数并将其作为 docs 列表的一部分。

使用流收集器对对象进行分组

group objects using stream collector

java

java-stream

collectors