有没有类似 Iterator 的东西,但有像 Streams 这样的功能?

Is there something like an Iterator, but with functions like Streams?


  1. 从数据库加载一批数据
  2. 将该数据(Object[] 查询结果)映射到 class 以可读格式表示数据
  3. 写入文件
  4. 重复直到查询没有更多结果



// Disclaimer: "Something" is the structure I am not sure of now. 
// Could be an Iterator or something else that fits (Thats the question)
public class Orchestrator {
    private DataGetter dataGetter;

    public void doWork() {
        FileWriter writer = new FileWriter("filename");

        // Write the formatted data to the file
            .forEach(data -> writer.writeToFile(data));

public class FileWriter {
    public void writeToFile(List<Thing> data) {
        // Write to file

public class DataGetter {
    private ThingDao thingDao;

    public Something<List<Thing>> getData() {

        // Map data to the correct format and return that
        return thingDao.getThings()
            .map(partialResult -> /* map to object */);

public class ThingDao {

    public Something<List<Object[]>> getThings() {
        Query q = ...;
        // Dont know what to return



public class QIterator<E> implements Iterator<List<E>> {
    public static String QUERY_OFFSET = "queryOffset";
    public static String QUERY_LIMIT = "queryLimit";

    private Query query;

    private long lastResultIndex = 0;
    private long batchSize;

    private Function<List<Object>, List<E>> mapper;

    public QIterator(Query query, long batchSize) {
        this.query = query;
        this.batchSize = batchSize;

    public QIterator(Query query, long batchSize, Function<List<Object>, List<E>> mapper) {
        this(query, batchSize);
        this.mapper = mapper;

    public boolean hasNext() {
        return lastResultIndex % batchSize == 0;

    public List<E> next() {
        query.setParameter(QueryIterator.QUERY_OFFSET, lastResultIndex);
        query.setParameter(QueryIterator.QUERY_LIMIT, batchSize);

        List<Object> result = (List<Object>) query.getResultList(); // unchecked
        lastResultIndex += result.size();

        List<E> mappedResult;
        if (mapper != null) {
            mappedResult = mapper.apply(result);
        } else {
            mappedResult = (List<E>) result; // unchecked

        return mappedResult;

    public <R> QIterator<R> map(Function<List<E>, List<R>> appendingMapper) {
        return new QIterator<>(query, batchSize, (data) -> {
            if (this.mapper != null) {
                return appendingMapper.apply(this.mapper.apply(data));
            } else {
                return appendingMapper.apply((List<E>) data);

    public void forEach(BiConsumer<List<E>, Integer> consumer) {
        for (int i = 0; this.hasNext(); i++) {
            consumer.accept(this.next(), i);

目前为止这是有效的,但是有一些我不太喜欢的 unchecked 作业,而且我希望能够将一个 QIterator “附加”到另一个 QIterator 本身并不难,但是它也应该采用追加之后的地图。

假设您有一个以分页方式提供数据的 DAO,例如通过将 LIMITOFFSET 子句应用于基础 SQL。这样的 DAO class 将有一个方法将这些值作为参数,即该方法将符合以下功能方法:

public interface PagedDao<T> {
    List<T> getData(int offset, int limit);

例如调用 getData(0, 20) 会 return 前 20 行(第 1 页),调用 getData(60, 20) 会 return 第 4 页的 20 行。如果方法 return 少超过 20 行,这意味着我们得到了最后一页。在最后一行之后请求数据将 return 一个空列表。

对于下面的演示,我们可以模拟这样一个 DAO class:

public class MockDao {
    private final int rowCount;
    public MockDao(int rowCount) {
        this.rowCount = rowCount;
    public List<SimpleRow> getSimpleRows(int offset, int limit) {
        System.out.println("DEBUG: getData(" + offset + ", " + limit + ")");
        if (offset < 0 || limit <= 0)
            throw new IllegalArgumentException();
        List<SimpleRow> data = new ArrayList<>();
        for (int i = 0, rowNo = offset + 1; i < limit && rowNo <= this.rowCount; i++, rowNo++)
            data.add(new SimpleRow("Row #" + rowNo));
        System.out.println("DEBUG:   data = " + data);
        return data;

public class SimpleRow {
    private final String data;
    public SimpleRow(String data) {
        this.data = data;
    public String toString() {
        return "Row[data=" + this.data + "]";

如果你想生成一个Stream of rows from that method, streaming all rows in blocks of a certain size, we need a Spliterator for that, so we can use StreamSupport.stream(Spliterator<T> spliterator, boolean parallel)来创建一个流。

下面是这样一个 Spliterator 的实现:

public class PagedDaoSpliterator<T> implements Spliterator<T> {
    private final PagedDao<T> dao;
    private final int blockSize;
    private int nextOffset;
    private List<T> data;
    private int dataIdx;
    public PagedDaoSpliterator(PagedDao<T> dao, int blockSize) {
        if (blockSize <= 0)
            throw new IllegalArgumentException();
        this.dao = Objects.requireNonNull(dao);
        this.blockSize = blockSize;
    public boolean tryAdvance(Consumer<? super T> action) {
        if (this.data == null) {
            if (this.nextOffset == -1/*At end*/)
                return false; // Already at end
            this.data = this.dao.getData(this.nextOffset, this.blockSize);
            this.dataIdx = 0;
            if (this.data.size() < this.blockSize)
                this.nextOffset = -1/*At end, after this data*/;
                this.nextOffset += data.size();
            if (this.data.isEmpty()) {
                this.data = null;
                return false; // At end
        if (this.dataIdx == this.data.size())
            this.data = null;
        return true;
    public Spliterator<T> trySplit() {
        return null; // Parallel processing not supported
    public long estimateSize() {
        return Long.MAX_VALUE; // Unknown
    public int characteristics() {
        return ORDERED | NONNULL;

我们现在可以使用上面的模拟 DAO 进行测试:

MockDao dao = new MockDao(13);
Stream<SimpleRow> stream = StreamSupport.stream(
        new PagedDaoSpliterator<>(dao::getSimpleRows, 5), /*parallel*/false);


DEBUG: getData(0, 5)
DEBUG:   data = [Row[data=Row #1], Row[data=Row #2], Row[data=Row #3], Row[data=Row #4], Row[data=Row #5]]
Row[data=Row #1]
Row[data=Row #2]
Row[data=Row #3]
Row[data=Row #4]
Row[data=Row #5]
DEBUG: getData(5, 5)
DEBUG:   data = [Row[data=Row #6], Row[data=Row #7], Row[data=Row #8], Row[data=Row #9], Row[data=Row #10]]
Row[data=Row #6]
Row[data=Row #7]
Row[data=Row #8]
Row[data=Row #9]
Row[data=Row #10]
DEBUG: getData(10, 5)
DEBUG:   data = [Row[data=Row #11], Row[data=Row #12], Row[data=Row #13]]
Row[data=Row #11]
Row[data=Row #12]
Row[data=Row #13]

可以看出,我们得到了 13 行数据,以 5 行为一组从数据库中检索。



stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(queryThatReturnsAllRowsOrdered);
Stream.generate(rs.next() ? map(rs) : null)
  .filter(<some predicate>)
  .forEach(<some operation);


这种方法一次在内存中只有一行,并且仅通过 运行 1 个查询将数据库负载降至最低。

ResultSet 映射比从 Object[] 映射要简单和自然得多,因为您可以通过 name 访问列并使用正确键入的值,例如:

MyDao map(ResultSet rs) {
    try {
        String someStr = rs.getString("COLUMN_X");
        int someInt = rs.getInt("COLUMN_Y"):
        return new MyDao(someStr, someInt);
    } catch (SQLException e ) {
        throw new RuntimeException(e);