如何使用 JpaRepository 进行批量(多行)插入?

How to do bulk (multi row) inserts with JpaRepository?

当从服务层使用长 List<Entity> 调用我的 JpaRepositorysaveAll 方法时,Hibernate 的跟踪日志记录显示每个实体发出单个 SQL 语句.

我可以强制它执行批量插入(即多行)而不需要手动 fiddle 和 EntityManger、事务等甚至原始 SQL 语句字符串吗?

对于多行插入,我的意思是不只是从以下位置过渡:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

至:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

而是:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

在 PROD 中,我使用的是 CockroachDB,性能差异很大。

下面是重现问题的最小示例(为简单起见,H2)。


./src/main/kotlin/ThingService.kt:

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties:

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts:

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

运行:

./gradlew bootRun

触发数据库插入:

curl http://localhost:8080/test_trigger

日志输出:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)

您可以配置 Hibernate 来执行批量 DML。看看 Spring Data JPA - concurrent Bulk inserts/updates。我认为答案的第 2 部分可以解决您的问题:

Enable the batching of DML statements Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

Quoting from batch INSERT and UPDATE statements:

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true

hibernate.order_updates = true

hibernate.jdbc.batch_versioned_data = true

更新:您必须在 application.properties 文件中以不同方式设置休眠属性。它们在命名空间下:spring.jpa.properties.*。示例如下所示:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

要使用 Spring Boot 和 Spring Data JPA 进行批量插入,您只需要两件事:

  1. 将选项 spring.jpa.properties.hibernate.jdbc.batch_size 设置为您需要的适当值(例如:20)。

  2. 使用您的存储库的 saveAll() 方法和准备插入的实体列表。

工作示例是 here

关于将insert语句改成这样:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

PostgreSQL 中可用:您可以在 jdbc 连接字符串中将选项 reWriteBatchedInserts 设置为 true:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

然后 jdbc 驱动程序将执行 this transformation

有关批处理的其他信息,您可以找到 here

已更新

Kotlin 中的演示项目:sb-kotlin-batch-insert-demo

已更新

Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.

潜在的问题是 SimpleJpaRepository 中的以下代码:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

除了批处理大小 属性 设置之外,您还必须确保 class SimpleJpaRepository 调用持续存在而不是合并。有几种方法可以解决此问题:使用不查询序列的 @Id 生成器,例如

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

或者通过让您的实体实现 Persistable 并覆盖 isNew() 调用

来强制持久性将记录视为新记录
@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

或者覆盖 save(List) 并使用实体管理器调用 persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

以上代码基于以下链接:

所有提到的方法都有效,但会很慢,特别是如果插入数据的来源位于其他 table。首先,即使使用 batch_size>1,插入操作也会在多个 SQL 查询中执行。其次,如果源数据位于另一个 table 中,则需要使用其他查询获取数据(并且在最坏的情况下将所有数据加载到内存中),并将其转换为静态批量插入。第三,通过单独 persist() 调用每个实体(即使启用了批处理),您将使用所有这些实体实例膨胀实体管理器一级缓存。

但是 Hibernate 还有另一种选择。如果您将 Hibernate 用作 JPA 提供程序,则可以回退到 HQL,后者 supports bulk inserts 本机使用来自另一个 table 的子选择。示例:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

这是否有效取决于您的 ID 生成策略。如果Entity.id是数据库生成的(比如MySQL自增),则执行成功。如果 Entity.id 是由您的代码生成的(对于 UUID 生成器尤其如此),它将失败并出现“不支持的 ID 生成方法”异常。

然而,在后一种情况下,这个问题可以通过自定义 SQL 函数来解决。例如在 PostgreSQL 中我使用 uuid-ossp 扩展提供 uuid_generate_v4() 功能,我最终在我的自定义对话框中注册:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

    public MyPostgresDialect() {
        registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
    }
}

然后我将此 class 注册为休眠对话框:

hibernate.dialect=MyPostgresDialect

我终于可以在批量插入查询中使用这个函数了:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

最重要的是Hibernate生成的底层SQL来完成这个操作,它只是一个查询:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]