使用默认隔离级别并发创建作业实例会导致 MySQL 死锁?

Creating job instances concurrently causes MySQL deadlock using default isolation level?

最近我们 运行 一项测试同时创建了不同的 Spring 批处理作业实例(例如并行的 10 个线程,作业名称相似但不同,例如具有相同的前缀)。而且很容易触发 MySQL exception is

报告的死锁错误

org.springframework.dao.DeadlockLoserDataAccessException: PreparedStatementCallback; SQL [INSERT into BATCH_JOB_INSTANCE(JOB_INSTANCE_ID, JOB_NAME, JOB_KEY, VERSION) values (?, ?, ?, ?)]; Deadlock found when trying to get lock; try restarting transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:267) at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72) at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1443) at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:633) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:862) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:917) at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:922) at org.springframework.batch.core.repository.dao.JdbcJobInstanceDao.createJobInstance(JdbcJobInstanceDao.java:120) at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:140) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)

我们搜索了有关死锁的现有报告,发现其中一些特定于 SQLServer,如下所示:(https://github.com/spring-projects/spring-batch/issues/1448)。通过分析创建作业的隔离级别(SERIALIZABLE)和操作顺序,我们认为死锁的触发方式如下:

1、在创建job实例之前,代码会先查询batch_job_instance table检查实例是否已经存在(大约3次),在SERIALIZABLE模式下,this hold shared next -key lock (https://dev.mysql.com/doc/refman/5.7/en/innodb-next-key-locking.html) in MySQL 锁定范围内与作业名称相关的记录。

2、线程 2 想创建 job2 并在 batch_job_instance 中插入一行,线程 3 想做同样的事情,因为两个线程持有相同的读下一键锁和需要的行被插入的也在键范围内,就会发生死锁。

参考这里的link(https://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/core/repository/support/AbstractJobRepositoryFactoryBean.html#setIsolationLevelForCreate-java.lang.String-),我们尝试将隔离级别更改为REPEATABLE_READ,这没有任何死锁。

所以这里的关键问题是:

将隔离级别设置为 REPEATABLE_READ 是此处推荐的解决方案吗?由于未设置为默认选项,此解决方案是否有任何副作用?

非常感谢!

we tried changing the isolation level to REPEATABLE_READ and this worked without any deadlock.

So the key question here is : Is setting isolation level to REPEATABLE_READ the recommended solution here and is there any side effect of this solution as it's not set as default option ?

是的,这就是要走的路。如果 SERIALIZABLE 过于激进,您可以为作业存储库使用不太激进的隔离级别。这就是提供 setIsolationLevelForCreate 的原因。这实际上记录在其 Javadoc:

ISOLATION_REPEATABLE_READ would work as well