需要在 5 秒内使用休眠在 mysql 中插入 100000 行
Need to insert 100000 rows in mysql using hibernate in under 5 seconds
我正在尝试使用 Hibernate(JPA) 在 5 秒内在 MYSQL table 中插入 100,000 行。我已经尝试了 hibernate 提供的所有技巧,但仍然不能超过 35 秒。
第一次优化:我从 IDENTITY 序列生成器开始,导致插入时间为 60 秒。后来我放弃了序列生成器,开始自己分配 @Id
字段,方法是阅读 MAX(id)
并使用 AtomicInteger.incrementAndGet()
自己分配字段。这将插入时间减少到 35 秒。
第二次优化:我通过添加
启用批量插入
<prop key="hibernate.jdbc.batch_size">30</prop>
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="hibernate.jdbc.batch_versioned_data">true</prop>
到配置。我震惊地发现批量插入对减少插入时间毫无作用。还是35秒!
现在,我正在考虑尝试使用多线程插入。
有人有任何指示吗?我应该选择 MongoDB 吗?
以下是我的配置:
1.休眠配置
`
<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="packagesToScan" value="com.progresssoft.manishkr" />
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
</property>
<property name="jpaProperties">
<props>
<prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop>
<prop key="hibernate.dialect">${hibernate.dialect}</prop>
<prop key="hibernate.show_sql">${hibernate.show_sql}</prop>
<prop key="hibernate.format_sql">${hibernate.format_sql}</prop>
<prop key="hibernate.jdbc.batch_size">30</prop>
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="hibernate.jdbc.batch_versioned_data">true</prop>
</props>
</property>
</bean>
<bean class="org.springframework.jdbc.datasource.DriverManagerDataSource"
id="dataSource">
<property name="driverClassName" value="${database.driver}"></property>
<property name="url" value="${database.url}"></property>
<property name="username" value="${database.username}"></property>
<property name="password" value="${database.password}"></property>
</bean>
<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactoryBean" />
</bean>
<tx:annotation-driven transaction-manager="transactionManager" />
`
- 实体配置:
`
@Entity
@Table(name = "myEntity")
public class MyEntity {
@Id
private Integer id;
@Column(name = "deal_id")
private String dealId;
....
....
@Temporal(TemporalType.TIMESTAMP)
@Column(name = "timestamp")
private Date timestamp;
@Column(name = "amount")
private BigDecimal amount;
@OneToOne(cascade = CascadeType.ALL)
@JoinColumn(name = "source_file")
private MyFile sourceFile;
public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) {
this.id = id;
this.dealId = dealId;
...
...
...
this.amount = amount;
this.sourceFile = sourceFile;
}
public String getDealId() {
return dealId;
}
public void setDealId(String dealId) {
this.dealId = dealId;
}
...
...
....
public BigDecimal getAmount() {
return amount;
}
public void setAmount(BigDecimal amount) {
this.amount = amount;
}
....
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
`
- 持久化代码(服务):
`
@Service
@Transactional
public class ServiceImpl implements MyService{
@Autowired
private MyDao dao;
....
`void foo(){
for(MyObject d : listOfObjects_100000){
dao.persist(d);
}
}
`
4.道 class :
`
@Repository
public class DaoImpl implements MyDao{
@PersistenceContext
private EntityManager em;
public void persist(Deal deal){
em.persist(deal);
}
}
`
日志:
`
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2]
...
...
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27
18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=?
18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection
18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection
'
呵呵。
您可以做很多事情来提高速度。
1.) 使用@DynamicInsert 和@DynamicUpdate 来防止数据库插入非空列和更新更改的列。
2.) 尝试将列直接插入(不使用休眠)到您的数据库中,看看休眠是否真的是您的瓶颈。
3.) 使用 sessionfactory 并且只在每个时间提交你的事务。 100 插入。或者只打开和关闭事务一次,每 100 次插入刷新一次数据。
4.) 使用 ID 生成策略 "sequence" 并让 hibernate 预分配(通过参数 allocationsize)ID。
5.) 使用缓存。
如果使用不当,其中一些可能的解决方案可能会在时间上存在缺陷。但是你有很多机会。
您正在使用 Spring 来管理事务,但通过使用 thread
作为当前会话上下文来中断它。当使用 Spring 来管理您的交易时,不要乱用 hibernate.current_session_context_class
属性。去掉它。
不要使用 DriverManagerDataSource
使用正确的连接池,如 HikariCP。
在您的 for 循环中,您应该定期 flush
和 clear
EntityManager
,最好与您的批量大小相同。如果你不这样做,单个 persist 会花费越来越长的时间,因为当你这样做时,Hibernate 会检查一级缓存中的脏对象,对象越多,它花费的时间就越多。 10 或 100 个是可以接受的,但为每个持久化对象检查 10000 个对象将造成损失。
-
@Service
@Transactional
public class ServiceImpl implements MyService{
@Autowired
private MyDao dao;
@PersistenceContext
private EntityManager em;
void foo(){
int count = 0;
for(MyObject d : listOfObjects_100000){
dao.persist(d);
count++;
if ( (count % 30) == 0) {
em.flush();
em.clear();
}
}
}
另一个要考虑的选项是 StatelessSession:
A command-oriented API for performing bulk operations against a
database.
A stateless session does not implement a first-level cache nor
interact with any second-level cache, nor does it implement
transactional write-behind or automatic dirty checking, nor do
operations cascade to associated instances. Collections are ignored by
a stateless session. Operations performed via a stateless session
bypass Hibernate's event model and interceptors. Stateless sessions
are vulnerable to data aliasing effects, due to the lack of a
first-level cache.
For certain kinds of transactions, a stateless session may perform
slightly faster than a stateful session.
相关讨论:
Using StatelessSession for Batch processing
在尝试了所有可能的解决方案后,我终于找到了在 5 秒内插入 100,000 行的解决方案!
我尝试过的事情:
1) 使用 AtomicInteger
将 hibernate/database 的 AUTOINCREMENT/GENERATED id 替换为自行生成的 ID
2) 使用 batch_size=50
启用 batch_inserts
3) 每 'batch_size' 次 persist() 调用后刷新缓存
4) 多线程(没有尝试这个)
最终起作用的是使用 本地多插入查询 并在一个 sql 插入查询中插入 1000 行,而不是使用 persist() 在每个实体上。为了插入 100,000 个实体,我创建了一个这样的原生查询 "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"
[1000 行插入一个 sql 插入查询]
现在插入100,000条记录大约需要3秒!所以瓶颈是 orm 本身!对于批量插入,唯一似乎有效的是本机插入查询!
我正在尝试使用 Hibernate(JPA) 在 5 秒内在 MYSQL table 中插入 100,000 行。我已经尝试了 hibernate 提供的所有技巧,但仍然不能超过 35 秒。
第一次优化:我从 IDENTITY 序列生成器开始,导致插入时间为 60 秒。后来我放弃了序列生成器,开始自己分配 @Id
字段,方法是阅读 MAX(id)
并使用 AtomicInteger.incrementAndGet()
自己分配字段。这将插入时间减少到 35 秒。
第二次优化:我通过添加
启用批量插入<prop key="hibernate.jdbc.batch_size">30</prop>
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="hibernate.jdbc.batch_versioned_data">true</prop>
到配置。我震惊地发现批量插入对减少插入时间毫无作用。还是35秒!
现在,我正在考虑尝试使用多线程插入。 有人有任何指示吗?我应该选择 MongoDB 吗?
以下是我的配置: 1.休眠配置 `
<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="packagesToScan" value="com.progresssoft.manishkr" />
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
</property>
<property name="jpaProperties">
<props>
<prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop>
<prop key="hibernate.dialect">${hibernate.dialect}</prop>
<prop key="hibernate.show_sql">${hibernate.show_sql}</prop>
<prop key="hibernate.format_sql">${hibernate.format_sql}</prop>
<prop key="hibernate.jdbc.batch_size">30</prop>
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="hibernate.jdbc.batch_versioned_data">true</prop>
</props>
</property>
</bean>
<bean class="org.springframework.jdbc.datasource.DriverManagerDataSource"
id="dataSource">
<property name="driverClassName" value="${database.driver}"></property>
<property name="url" value="${database.url}"></property>
<property name="username" value="${database.username}"></property>
<property name="password" value="${database.password}"></property>
</bean>
<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactoryBean" />
</bean>
<tx:annotation-driven transaction-manager="transactionManager" />
`
- 实体配置:
`
@Entity
@Table(name = "myEntity")
public class MyEntity {
@Id
private Integer id;
@Column(name = "deal_id")
private String dealId;
....
....
@Temporal(TemporalType.TIMESTAMP)
@Column(name = "timestamp")
private Date timestamp;
@Column(name = "amount")
private BigDecimal amount;
@OneToOne(cascade = CascadeType.ALL)
@JoinColumn(name = "source_file")
private MyFile sourceFile;
public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) {
this.id = id;
this.dealId = dealId;
...
...
...
this.amount = amount;
this.sourceFile = sourceFile;
}
public String getDealId() {
return dealId;
}
public void setDealId(String dealId) {
this.dealId = dealId;
}
...
...
....
public BigDecimal getAmount() {
return amount;
}
public void setAmount(BigDecimal amount) {
this.amount = amount;
}
....
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
`
- 持久化代码(服务):
`
@Service
@Transactional
public class ServiceImpl implements MyService{
@Autowired
private MyDao dao;
....
`void foo(){
for(MyObject d : listOfObjects_100000){
dao.persist(d);
}
}
` 4.道 class :
`
@Repository
public class DaoImpl implements MyDao{
@PersistenceContext
private EntityManager em;
public void persist(Deal deal){
em.persist(deal);
}
}
`
日志: `
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:32.906 [http-nio-8080-exec-2]
... ...
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27
18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=?
18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection
18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch
18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection
18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection
'
呵呵。 您可以做很多事情来提高速度。
1.) 使用@DynamicInsert 和@DynamicUpdate 来防止数据库插入非空列和更新更改的列。
2.) 尝试将列直接插入(不使用休眠)到您的数据库中,看看休眠是否真的是您的瓶颈。
3.) 使用 sessionfactory 并且只在每个时间提交你的事务。 100 插入。或者只打开和关闭事务一次,每 100 次插入刷新一次数据。
4.) 使用 ID 生成策略 "sequence" 并让 hibernate 预分配(通过参数 allocationsize)ID。
5.) 使用缓存。
如果使用不当,其中一些可能的解决方案可能会在时间上存在缺陷。但是你有很多机会。
您正在使用 Spring 来管理事务,但通过使用
thread
作为当前会话上下文来中断它。当使用 Spring 来管理您的交易时,不要乱用hibernate.current_session_context_class
属性。去掉它。不要使用
DriverManagerDataSource
使用正确的连接池,如 HikariCP。在您的 for 循环中,您应该定期
flush
和clear
EntityManager
,最好与您的批量大小相同。如果你不这样做,单个 persist 会花费越来越长的时间,因为当你这样做时,Hibernate 会检查一级缓存中的脏对象,对象越多,它花费的时间就越多。 10 或 100 个是可以接受的,但为每个持久化对象检查 10000 个对象将造成损失。
-
@Service
@Transactional
public class ServiceImpl implements MyService{
@Autowired
private MyDao dao;
@PersistenceContext
private EntityManager em;
void foo(){
int count = 0;
for(MyObject d : listOfObjects_100000){
dao.persist(d);
count++;
if ( (count % 30) == 0) {
em.flush();
em.clear();
}
}
}
另一个要考虑的选项是 StatelessSession:
A command-oriented API for performing bulk operations against a database.
A stateless session does not implement a first-level cache nor interact with any second-level cache, nor does it implement transactional write-behind or automatic dirty checking, nor do operations cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate's event model and interceptors. Stateless sessions are vulnerable to data aliasing effects, due to the lack of a first-level cache.
For certain kinds of transactions, a stateless session may perform slightly faster than a stateful session.
相关讨论: Using StatelessSession for Batch processing
在尝试了所有可能的解决方案后,我终于找到了在 5 秒内插入 100,000 行的解决方案!
我尝试过的事情:
1) 使用 AtomicInteger
将 hibernate/database 的 AUTOINCREMENT/GENERATED id 替换为自行生成的 ID2) 使用 batch_size=50
启用 batch_inserts3) 每 'batch_size' 次 persist() 调用后刷新缓存
4) 多线程(没有尝试这个)
最终起作用的是使用 本地多插入查询 并在一个 sql 插入查询中插入 1000 行,而不是使用 persist() 在每个实体上。为了插入 100,000 个实体,我创建了一个这样的原生查询 "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"
[1000 行插入一个 sql 插入查询]
现在插入100,000条记录大约需要3秒!所以瓶颈是 orm 本身!对于批量插入,唯一似乎有效的是本机插入查询!