使用 Hibernate/Spring 数据批量插入 Postgres 数据库:60K 行需要 2 分钟,这是不可接受的
Batch-Insert into Postgres DB using Hibernate/Spring Data: 60K Rows Takes 2 Minutes which is unacceptable
我需要使用 Hibernate/Spring 数据将 60K 行插入到我的 Java/Spring 应用程序的 Postgres 数据库中。
进入的INSERT数据是(1)USERS_T,(2)关联的新用户也必须在STUDY_PARTICIPANTS_T。这两个都是 60K 条记录。
下面的工作正常,但性能很差:60K 需要 2 分钟。请注意,我正在填写 Hibernate 实体,然后做基于 saveAll
在大小为 1000 的列表上。
UsersT user = new UsersT();
user.setUsername(study.getAbbreviation().toUpperCase()+subjectId);
user.setRoleTypeId(new LookupT(150));
user.setCreatedDate(new Date());
//...
List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
StudyParticipantsT sp = new StudyParticipantsT();
sp.setStudyT(study);
sp.setUsersT(user);
sp.setSubjectId(subjectId);
sp.setLocked("N");
participants.add(sp);
user.setStudyParticipantsTs(participants);
// Add to Batch-Insert List; if list size ready for batch-insert, or if at the end of all subjectIds, do Batch-Insert saveAll() and clear the list
batchInsertUsers.add(user);
if (batchInsertUsers.size() == 1000 || i == subjectIds.size() - 1) {
// Log this Batch-Insert
if(log.isDebugEnabled()){
log.debug("createParticipantsAccounts() Batch-Insert: Saving " + batchInsertUsers.size() + " records");
}
userDAO.saveAll(batchInsertUsers);
// Reset list
batchInsertUsers.clear();
}
我找到了一个线程,其中有人遇到了同样的问题,他们找到的唯一解决方案是为每个 1000 块组成一个自定义 Native-SQL INSERT (..), (..), (..)
字符串,并且 运行 手动删除 ORM/Hibernate 层:
但是我的 INSERT 涉及一些连接表。我可以自己花时间将所有这些实体语句重写成自定义 SQL,但这并不简单。
还有其他解决办法吗?我在用着
- Spring 5.0.2
- Hibernate5.2.12
我们通过使用 SpringJDBC 的 jdbcTemplate.batchUpdate
(无 Hibernate) 并提前为任何外键。
我们没有达到实际 N 次重复 INSERT
语句的水平,而上面提到的其他发帖者做到了;我们仍在使用框架方法 (JDBCTemplate),但至少我们不再使用 Hibernate/ORM。这种方法很快,但不像 N 次重复 INSERT
那样 super-fast -- 但现在可以接受了。
实际的 SpringJDBC Batch-Insert 是通过 jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {..}
发生的,我们实际上自己拆分了批次 -- BatchPreparedStatementSetter
不会自动为我们拆分任何东西,它只会提交具有预定大小的特定批次。
/**
* The following method performs a Native-SQL Batch-Insert of Participant accounts (using JdbcTemplate) to improve performance.
* Each Participant account requires 2 INSERTs: (1) USERS_T, (2) STUDY_PARTICIPANTS_T (with an FK reference to USERS_T.ID).
* Since there is no easy way to track the Sequence IDs between the two Batch-Inserts, we reserve the ID range for both tables, and
* then manually calculate our own IDs for USERS_T and STUDY_PARTICIPANTS_T ourselves.
* Initially, domain objects are filled out; then they are added to the Batch List that we submit and clear ourselves.
* (Originally the Batch-Insert was implemented with Hibernate/HQL, but due to slow performance it was nativized with jdbcTemplate.)
*
* NOTE: The entire method is @Transactional and all data will be rolled back in case of any exceptions in this method (rollbackFor=Exception.class).
* The updated Sequence values (set during reservation) will not be rolled back in this case, but Sequence gaps are normal.
*/
@Override
@Transactional(readOnly = false, rollbackFor = Exception.class)
public void createParticipantsAccounts(long studyId, List<String> subjectIds) throws Exception {
int maxInsertParticipantsBatchSize = 1000; // Batch size is 1000
/*
We need to insert into 2 tables, USERS_T and STUDY_PARTICIPANTS_T.
The table STUDY_PARTICIPANTS_T has an FK dependency on USERS_T.ID.
Since there is no easy way to track the Sequence IDs between the two Batch-Inserts, we reserve the ID range for both tables,
and then manually calculate our own IDs for USERS_T and STUDY_PARTICIPANTS_T ourselves.
The Sequences are immediately updated to the calculated final values to reserve the range.
*/
// 1. Obtain current Sequence values
Integer currUsersTSeqVal = userDAO.getCurrentUsersTSeqVal();
Integer currStudyParticipantsTSeqVal = studyParticipantsDAO.getCurrentStudyParticipantsTSeqVal();
// 2. Immediately update the Sequences to the calculated final value (this reserves the ID range immediately)
// In Postgres, updating the Sequences is: SELECT setval('users_t_id_seq', :val)
userDAO.setCurrentUsersTSeqVal(currUsersTSeqVal + subjectIds.size());
studyParticipantsDAO.setCurrentStudyParticipantsTSeqVal(currStudyParticipantsTSeqVal + subjectIds.size());
// List for Batch-Inserts, maintained and submitted by ourselves in accordance with our batch size
List<UsersT> batchInsertUsers = new ArrayList<UsersT>();
for(int i = 0; i < subjectIds.size(); i++) {
String subjectId = subjectIds.get(i);
// Prepare domain object (UsersT with associated StudyParticipantsT) to be used in the Native-SQL jdbcTemplate batchUpdate
UsersT user = new UsersT();
user.setId(currUsersTSeqVal + 1 + i); // Set ID to calculated value
user.setUsername(study.getAbbreviation().toUpperCase()+subjectId);
user.setActiveFlag(true);
// etc., fill out object, then subobject:
List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
StudyParticipantsT sp = new StudyParticipantsT();
sp.setId(currStudyParticipantsTSeqVal + 1 + i); // Set ID to caculated value
// ...etc.
user.setStudyParticipantsTs(participants);
// Add to Batch-Insert List of Users
batchInsertUsers.add(user);
// If list size ready for Batch-Insert, or if at the end of all subjectIds, perform Batch Insert (both tables) and clear list
if (batchInsertUsers.size() == maxInsertParticipantsBatchSize || i == subjectIds.size() - 1) {
// Part 1: Insert batch into USERS_T
nativeBatchInsertUsers(jdbcTemplate, batchInsertUsers);
// Part 2: Insert batch into STUDY_PARTICIPANTS_T
nativeBatchInsertStudyParticipants(jdbcTemplate, batchInsertUsers);
// Reset list
batchInsertUsers.clear();
}
}
}
实际sub-methods为Batch-Insert:
/**
* Native-SQL Batch-Insert into USERS_T for Participant Upload.
* NOTE: This method is part of its Parent's @Transactional. (Note also that we need "final" on the List param for Inner-Class access to this variable.)
*
* @param jdbcTemplate
* @param batchInsertUsers
*/
private void nativeBatchInsertUsers(JdbcTemplate jdbcTemplate, final List<UsersT> batchInsertUsers) {
String sqlInsert = "INSERT INTO PUBLIC.USERS_T (id, password, user_name, created_by, created_date, last_changed_by, last_changed_date, " +
"first_name, last_name, organization, phone, lockout_date, lockout_counter, last_login_date, " +
"password_last_changed_date, temporary_password, active_flag, uuid, " +
"role_type_id, ws_account_researcher_id) " +
"VALUES (?, ?, ?, ?, ?, ?, ?, " +
"?, ?, ?, ?, ?, ?, ?, " +
"?, ?, ?, ?, " +
"?, ?" +
") ";
jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {
@Override
public int getBatchSize() {
return batchInsertUsers.size();
}
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setInt(1, batchInsertUsers.get(i).getId()); // ID (provided by ourselves)
// etc., set PS for each i-th object
}
});
}
/**
* Native-SQL Batch-Insert into STUDY_PARTICIPANTS_T for Participant Upload.
* NOTE: This method is part of its Parent's @Transactional. (Note also that we need "final" on the List param for Inner-Class access to this variable.)
*
* @param jdbcTemplate
* @param batchInsertUsers
*/
private void nativeBatchInsertStudyParticipants(JdbcTemplate jdbcTemplate, final List<UsersT> batchInsertUsers) {
String sqlInsert = "INSERT INTO PUBLIC.STUDY_PARTICIPANTS_T (id, study_id, subject_id, user_id, locked, " + "created_by, created_date, last_changed_by, last_changed_date) " +
"VALUES (?, ?, ?, ?, ?, " +
"?, ?, ?, ? " +
") ";
jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {
@Override
public int getBatchSize() {
return batchInsertUsers.size();
}
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setInt(1, batchInsertUsers.get(i).getStudyParticipantsTs().get(0).getId()); // ID (provided by ourselves)
// etc.
}
});
}
我需要使用 Hibernate/Spring 数据将 60K 行插入到我的 Java/Spring 应用程序的 Postgres 数据库中。
进入的INSERT数据是(1)USERS_T,(2)关联的新用户也必须在STUDY_PARTICIPANTS_T。这两个都是 60K 条记录。
下面的工作正常,但性能很差:60K 需要 2 分钟。请注意,我正在填写 Hibernate 实体,然后做基于 saveAll
在大小为 1000 的列表上。
UsersT user = new UsersT();
user.setUsername(study.getAbbreviation().toUpperCase()+subjectId);
user.setRoleTypeId(new LookupT(150));
user.setCreatedDate(new Date());
//...
List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
StudyParticipantsT sp = new StudyParticipantsT();
sp.setStudyT(study);
sp.setUsersT(user);
sp.setSubjectId(subjectId);
sp.setLocked("N");
participants.add(sp);
user.setStudyParticipantsTs(participants);
// Add to Batch-Insert List; if list size ready for batch-insert, or if at the end of all subjectIds, do Batch-Insert saveAll() and clear the list
batchInsertUsers.add(user);
if (batchInsertUsers.size() == 1000 || i == subjectIds.size() - 1) {
// Log this Batch-Insert
if(log.isDebugEnabled()){
log.debug("createParticipantsAccounts() Batch-Insert: Saving " + batchInsertUsers.size() + " records");
}
userDAO.saveAll(batchInsertUsers);
// Reset list
batchInsertUsers.clear();
}
我找到了一个线程,其中有人遇到了同样的问题,他们找到的唯一解决方案是为每个 1000 块组成一个自定义 Native-SQL INSERT (..), (..), (..)
字符串,并且 运行 手动删除 ORM/Hibernate 层:
但是我的 INSERT 涉及一些连接表。我可以自己花时间将所有这些实体语句重写成自定义 SQL,但这并不简单。
还有其他解决办法吗?我在用着 - Spring 5.0.2 - Hibernate5.2.12
我们通过使用 SpringJDBC 的 jdbcTemplate.batchUpdate
(无 Hibernate) 并提前为任何外键。
我们没有达到实际 N 次重复 INSERT
语句的水平,而上面提到的其他发帖者做到了;我们仍在使用框架方法 (JDBCTemplate),但至少我们不再使用 Hibernate/ORM。这种方法很快,但不像 N 次重复 INSERT
那样 super-fast -- 但现在可以接受了。
实际的 SpringJDBC Batch-Insert 是通过 jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {..}
发生的,我们实际上自己拆分了批次 -- BatchPreparedStatementSetter
不会自动为我们拆分任何东西,它只会提交具有预定大小的特定批次。
/**
* The following method performs a Native-SQL Batch-Insert of Participant accounts (using JdbcTemplate) to improve performance.
* Each Participant account requires 2 INSERTs: (1) USERS_T, (2) STUDY_PARTICIPANTS_T (with an FK reference to USERS_T.ID).
* Since there is no easy way to track the Sequence IDs between the two Batch-Inserts, we reserve the ID range for both tables, and
* then manually calculate our own IDs for USERS_T and STUDY_PARTICIPANTS_T ourselves.
* Initially, domain objects are filled out; then they are added to the Batch List that we submit and clear ourselves.
* (Originally the Batch-Insert was implemented with Hibernate/HQL, but due to slow performance it was nativized with jdbcTemplate.)
*
* NOTE: The entire method is @Transactional and all data will be rolled back in case of any exceptions in this method (rollbackFor=Exception.class).
* The updated Sequence values (set during reservation) will not be rolled back in this case, but Sequence gaps are normal.
*/
@Override
@Transactional(readOnly = false, rollbackFor = Exception.class)
public void createParticipantsAccounts(long studyId, List<String> subjectIds) throws Exception {
int maxInsertParticipantsBatchSize = 1000; // Batch size is 1000
/*
We need to insert into 2 tables, USERS_T and STUDY_PARTICIPANTS_T.
The table STUDY_PARTICIPANTS_T has an FK dependency on USERS_T.ID.
Since there is no easy way to track the Sequence IDs between the two Batch-Inserts, we reserve the ID range for both tables,
and then manually calculate our own IDs for USERS_T and STUDY_PARTICIPANTS_T ourselves.
The Sequences are immediately updated to the calculated final values to reserve the range.
*/
// 1. Obtain current Sequence values
Integer currUsersTSeqVal = userDAO.getCurrentUsersTSeqVal();
Integer currStudyParticipantsTSeqVal = studyParticipantsDAO.getCurrentStudyParticipantsTSeqVal();
// 2. Immediately update the Sequences to the calculated final value (this reserves the ID range immediately)
// In Postgres, updating the Sequences is: SELECT setval('users_t_id_seq', :val)
userDAO.setCurrentUsersTSeqVal(currUsersTSeqVal + subjectIds.size());
studyParticipantsDAO.setCurrentStudyParticipantsTSeqVal(currStudyParticipantsTSeqVal + subjectIds.size());
// List for Batch-Inserts, maintained and submitted by ourselves in accordance with our batch size
List<UsersT> batchInsertUsers = new ArrayList<UsersT>();
for(int i = 0; i < subjectIds.size(); i++) {
String subjectId = subjectIds.get(i);
// Prepare domain object (UsersT with associated StudyParticipantsT) to be used in the Native-SQL jdbcTemplate batchUpdate
UsersT user = new UsersT();
user.setId(currUsersTSeqVal + 1 + i); // Set ID to calculated value
user.setUsername(study.getAbbreviation().toUpperCase()+subjectId);
user.setActiveFlag(true);
// etc., fill out object, then subobject:
List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
StudyParticipantsT sp = new StudyParticipantsT();
sp.setId(currStudyParticipantsTSeqVal + 1 + i); // Set ID to caculated value
// ...etc.
user.setStudyParticipantsTs(participants);
// Add to Batch-Insert List of Users
batchInsertUsers.add(user);
// If list size ready for Batch-Insert, or if at the end of all subjectIds, perform Batch Insert (both tables) and clear list
if (batchInsertUsers.size() == maxInsertParticipantsBatchSize || i == subjectIds.size() - 1) {
// Part 1: Insert batch into USERS_T
nativeBatchInsertUsers(jdbcTemplate, batchInsertUsers);
// Part 2: Insert batch into STUDY_PARTICIPANTS_T
nativeBatchInsertStudyParticipants(jdbcTemplate, batchInsertUsers);
// Reset list
batchInsertUsers.clear();
}
}
}
实际sub-methods为Batch-Insert:
/**
* Native-SQL Batch-Insert into USERS_T for Participant Upload.
* NOTE: This method is part of its Parent's @Transactional. (Note also that we need "final" on the List param for Inner-Class access to this variable.)
*
* @param jdbcTemplate
* @param batchInsertUsers
*/
private void nativeBatchInsertUsers(JdbcTemplate jdbcTemplate, final List<UsersT> batchInsertUsers) {
String sqlInsert = "INSERT INTO PUBLIC.USERS_T (id, password, user_name, created_by, created_date, last_changed_by, last_changed_date, " +
"first_name, last_name, organization, phone, lockout_date, lockout_counter, last_login_date, " +
"password_last_changed_date, temporary_password, active_flag, uuid, " +
"role_type_id, ws_account_researcher_id) " +
"VALUES (?, ?, ?, ?, ?, ?, ?, " +
"?, ?, ?, ?, ?, ?, ?, " +
"?, ?, ?, ?, " +
"?, ?" +
") ";
jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {
@Override
public int getBatchSize() {
return batchInsertUsers.size();
}
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setInt(1, batchInsertUsers.get(i).getId()); // ID (provided by ourselves)
// etc., set PS for each i-th object
}
});
}
/**
* Native-SQL Batch-Insert into STUDY_PARTICIPANTS_T for Participant Upload.
* NOTE: This method is part of its Parent's @Transactional. (Note also that we need "final" on the List param for Inner-Class access to this variable.)
*
* @param jdbcTemplate
* @param batchInsertUsers
*/
private void nativeBatchInsertStudyParticipants(JdbcTemplate jdbcTemplate, final List<UsersT> batchInsertUsers) {
String sqlInsert = "INSERT INTO PUBLIC.STUDY_PARTICIPANTS_T (id, study_id, subject_id, user_id, locked, " + "created_by, created_date, last_changed_by, last_changed_date) " +
"VALUES (?, ?, ?, ?, ?, " +
"?, ?, ?, ? " +
") ";
jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {
@Override
public int getBatchSize() {
return batchInsertUsers.size();
}
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setInt(1, batchInsertUsers.get(i).getStudyParticipantsTs().get(0).getId()); // ID (provided by ourselves)
// etc.
}
});
}