Spring GCP - 数据存储性能:批处理,遍历所有实体列表非常慢

Spring GCP - Datastore performance: Batch processing, iteration through all entity list is very slow

以下代码运行速度非常慢,处理 400 个实体几乎需要 30 秒:

    int page = 0;
    org.springframework.data.domain.Page<MyEntity> slice = null;
    while (true) {
        if (slice == null) {
            slice = repo.findAll(PageRequest.of(page, 400, Sort.by("date")));
        } else {
            slice = repo.findAll(slice.nextPageable());
        }
        if (!slice.hasNext()) {
            break;
        }
        slice.getContent().forEach(v -> v.setApp(SApplication.NAME_XXX));
        repo.saveAll(slice.getContent());
        LOGGER.info("processed: " + page);
        page++;
    }

我改用以下方法,每 400 个实体 4-6 秒(gcp lib 用于数据存储)

    Datastore service = DatastoreOptions.getDefaultInstance().getService();
    StructuredQuery.Builder<?> query = Query.newEntityQueryBuilder();
    int limit = 400;
    query.setKind("ENTITY_KIND").setLimit(limit);

    int count = 0;
    Cursor cursor = null;
    while (true) {
        if (cursor != null) {
            query.setStartCursor(cursor);
        }
        QueryResults<?> queryResult = service.run(query.build());

        List<Entity> entityList = new ArrayList<>();
        while (queryResult.hasNext()) {
            Entity loadEntity = (Entity) queryResult.next();
            Entity.Builder newEntity = Entity.newBuilder(loadEntity).set("app", SApplication.NAME_XXX.name());
            entityList.add(newEntity.build());
        }
        service.put(entityList.toArray(new Entity[0]));
        count += entityList.size();

        if (entityList.size() == limit) {
            cursor = queryResult.getCursorAfter();
        } else {
            break;
        }
        LOGGER.info("Processed: {}", count);
    }

为什么我不能使用 spring 进行批处理?

完整讨论在这里:https://github.com/spring-cloud/spring-cloud-gcp/issues/1824

第一个:

you need to use correct lib version: at least 1.2.0.M2

第二个:

you need to implement new method in repository interface:

@Query("select * from your_kind")
Slice<TestEntity> findAllSlice(Pageable pageable);

最终代码如下:

    LOGGER.info("start");
    int page = 0;
    Slice<TestEntity> slice = null;
    while (true) {
        if (slice == null) {
            slice = repo.findAllSlice(DatastorePageable.of(page, 400, Sort.by("date")));
        } else {
            slice = repo.findAllSlice(slice.nextPageable());
        }
        if (!slice.hasNext()) {
            break;
        }
        slice.getContent().forEach(v -> v.setApp("xx"));
        repo.saveAll(slice.getContent());
        LOGGER.info("processed: " + page);
        page++;
    }
    LOGGER.info("end");