Postgres 死锁 (select for share + insert) and (select for update + update)
Postgres deadlock with (select for share + insert) and (select for update + update)
我有以下 table(所有示例都带有 psycopg2
python 库):
CREATE TABLE IF NOT EXISTS TeamTasks
(
id SERIAL PRIMARY KEY,
round INTEGER,
task_id INTEGER,
team_id INTEGER,
score FLOAT DEFAULT 0,
UNIQUE (round, task_id, team_id)
)
我有两个功能:
- 将上一轮的
TeamTasks
复制到新一轮,SELECT
和INSERT
之间没有更新,函数实现如下:
query = """
WITH prev_table AS (
SELECT score FROM teamtasks
WHERE task_id = %(task_id)s AND team_id = %(team_id)s AND round <= %(round)s - 1
ORDER BY round DESC LIMIT 1
FOR SHARE
)
INSERT INTO TeamTasks (task_id, team_id, round, score)
SELECT %(task_id)s, %(team_id)s, %(round)s, score
FROM prev_table;
"""
with aux.db_cursor() as (conn, curs):
for team_id in range(team_count):
for task_id in range(task_count):
curs.execute(
query,
{
'task_id': task_id,
'team_id': team_id,
'round': cur_round + 1,
},
)
conn.commit()
aux.db_cursor
只是获取 psycopg2
连接和游标的便捷包装器。
- 更新
TeamTasks
中针对特定团队和特定任务以及多轮的行。它是这样实现的:
# I have team1_id, team2_id and task_id
query1 = "SELECT score from teamtasks WHERE team_id=%s AND task_id=%s AND round=%s FOR NO KEY UPDATE"
query2 = "UPDATE teamtasks SET score = %s WHERE team_id=%s AND task_id=%s AND round >= %s"
with aux.db_cursor() as (conn, curs):
curs.execute(query1, (team1_id, task_id, cur_round))
score1, = curs.fetchone()
curs.execute(query1, (team2_id, task_id, cur_round))
score2, = curs.fetchone()
sleep(0.1) # Here happens something time-consuming
curs.execute(query2, (score1 + 0.1, team1_id, task_id, cur_round))
curs.execute(query2, (score2 - 0.1, team2_id, task_id, cur_round))
conn.commit()
我可以保证在第二个函数中每个团队只能是一个更新的主题,因此所有并发更新的团队总是不同的。
此外,第一个函数 运行 很少,除了这两个函数没有其他人更新这个 table,所以第一个函数中的锁正是 table 在 TeamTasks
复制期间不会更改。
在上述环境中,我遇到了很多死锁,如下所示:
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] ERROR: deadlock detected
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] DETAIL: Process 49 waits for ShareLock on transaction 685; blocked by process 65.
postgres_1 | Process 65 waits for ShareLock on transaction 658; blocked by process 49.
postgres_1 | Process 49:
postgres_1 | WITH prev_table AS (
postgres_1 | SELECT score FROM teamtasks
postgres_1 | WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1 | ORDER BY round DESC LIMIT 1
postgres_1 | FOR SHARE
postgres_1 | )
postgres_1 | INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1 | SELECT 8, 6, 1, score
postgres_1 | FROM prev_table;
postgres_1 |
postgres_1 | Process 65: SELECT score from teamtasks WHERE team_id=0 AND task_id=8 AND round=0 FOR NO KEY UPDATE
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] HINT: See server log for query details.
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] CONTEXT: while locking tuple (0,69) in relation "teamtasks"
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] STATEMENT:
postgres_1 | WITH prev_table AS (
postgres_1 | SELECT score FROM teamtasks
postgres_1 | WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1 | ORDER BY round DESC LIMIT 1
postgres_1 | FOR SHARE
postgres_1 | )
postgres_1 | INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1 | SELECT 8, 6, 1, score
postgres_1 | FROM prev_table;
如何解决这些死锁?有没有我没有看到的巧妙解决方案?
select for share
在这里似乎没有必要。该语法用于保留参照完整性。在您的情况下,您正在从相同的 teamtasks
table 中进行选择和插入,因此您不必要地持有 table 上的锁,这会导致您的两个连接相互阻塞(并且它会最终最好重构代码,这样你只使用一个连接,但我不知道这对你来说有多可行)。据我所知,select for share
语法更多地与其他 table 的更新和引用完整性有关,而不是与相同 table.[=24= 的插入有关]
问题在于,在第一个 aux_db_cursor()
调用中,当您循环遍历 range(team_count)
和 range(task_count)
—— 然后在第二个 aux_db_cursor()
调用中,你在对某些行执行 UPDATE
之前执行一项耗时的任务——那些 UPDATE
锁请求将与那些 FOR SHARE
把锁。我会摆脱 FOR SHARE
锁,除非你真的需要它们(在这一点上,如果可能的话,我会想方设法将它全部整合到一个数据库连接中)。
披露:我为 EnterpriseDB (EDB)
工作
我有以下 table(所有示例都带有 psycopg2
python 库):
CREATE TABLE IF NOT EXISTS TeamTasks
(
id SERIAL PRIMARY KEY,
round INTEGER,
task_id INTEGER,
team_id INTEGER,
score FLOAT DEFAULT 0,
UNIQUE (round, task_id, team_id)
)
我有两个功能:
- 将上一轮的
TeamTasks
复制到新一轮,SELECT
和INSERT
之间没有更新,函数实现如下:
query = """
WITH prev_table AS (
SELECT score FROM teamtasks
WHERE task_id = %(task_id)s AND team_id = %(team_id)s AND round <= %(round)s - 1
ORDER BY round DESC LIMIT 1
FOR SHARE
)
INSERT INTO TeamTasks (task_id, team_id, round, score)
SELECT %(task_id)s, %(team_id)s, %(round)s, score
FROM prev_table;
"""
with aux.db_cursor() as (conn, curs):
for team_id in range(team_count):
for task_id in range(task_count):
curs.execute(
query,
{
'task_id': task_id,
'team_id': team_id,
'round': cur_round + 1,
},
)
conn.commit()
aux.db_cursor
只是获取 psycopg2
连接和游标的便捷包装器。
- 更新
TeamTasks
中针对特定团队和特定任务以及多轮的行。它是这样实现的:
# I have team1_id, team2_id and task_id
query1 = "SELECT score from teamtasks WHERE team_id=%s AND task_id=%s AND round=%s FOR NO KEY UPDATE"
query2 = "UPDATE teamtasks SET score = %s WHERE team_id=%s AND task_id=%s AND round >= %s"
with aux.db_cursor() as (conn, curs):
curs.execute(query1, (team1_id, task_id, cur_round))
score1, = curs.fetchone()
curs.execute(query1, (team2_id, task_id, cur_round))
score2, = curs.fetchone()
sleep(0.1) # Here happens something time-consuming
curs.execute(query2, (score1 + 0.1, team1_id, task_id, cur_round))
curs.execute(query2, (score2 - 0.1, team2_id, task_id, cur_round))
conn.commit()
我可以保证在第二个函数中每个团队只能是一个更新的主题,因此所有并发更新的团队总是不同的。
此外,第一个函数 运行 很少,除了这两个函数没有其他人更新这个 table,所以第一个函数中的锁正是 table 在 TeamTasks
复制期间不会更改。
在上述环境中,我遇到了很多死锁,如下所示:
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] ERROR: deadlock detected
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] DETAIL: Process 49 waits for ShareLock on transaction 685; blocked by process 65.
postgres_1 | Process 65 waits for ShareLock on transaction 658; blocked by process 49.
postgres_1 | Process 49:
postgres_1 | WITH prev_table AS (
postgres_1 | SELECT score FROM teamtasks
postgres_1 | WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1 | ORDER BY round DESC LIMIT 1
postgres_1 | FOR SHARE
postgres_1 | )
postgres_1 | INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1 | SELECT 8, 6, 1, score
postgres_1 | FROM prev_table;
postgres_1 |
postgres_1 | Process 65: SELECT score from teamtasks WHERE team_id=0 AND task_id=8 AND round=0 FOR NO KEY UPDATE
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] HINT: See server log for query details.
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] CONTEXT: while locking tuple (0,69) in relation "teamtasks"
postgres_1 | 2019-11-17 20:43:08.510 UTC [49] STATEMENT:
postgres_1 | WITH prev_table AS (
postgres_1 | SELECT score FROM teamtasks
postgres_1 | WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1 | ORDER BY round DESC LIMIT 1
postgres_1 | FOR SHARE
postgres_1 | )
postgres_1 | INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1 | SELECT 8, 6, 1, score
postgres_1 | FROM prev_table;
如何解决这些死锁?有没有我没有看到的巧妙解决方案?
select for share
在这里似乎没有必要。该语法用于保留参照完整性。在您的情况下,您正在从相同的 teamtasks
table 中进行选择和插入,因此您不必要地持有 table 上的锁,这会导致您的两个连接相互阻塞(并且它会最终最好重构代码,这样你只使用一个连接,但我不知道这对你来说有多可行)。据我所知,select for share
语法更多地与其他 table 的更新和引用完整性有关,而不是与相同 table.[=24= 的插入有关]
问题在于,在第一个 aux_db_cursor()
调用中,当您循环遍历 range(team_count)
和 range(task_count)
—— 然后在第二个 aux_db_cursor()
调用中,你在对某些行执行 UPDATE
之前执行一项耗时的任务——那些 UPDATE
锁请求将与那些 FOR SHARE
把锁。我会摆脱 FOR SHARE
锁,除非你真的需要它们(在这一点上,如果可能的话,我会想方设法将它全部整合到一个数据库连接中)。
披露:我为 EnterpriseDB (EDB)
工作