Postgres 死锁 (select for share + insert) and (select for update + update)

Postgres deadlock with (select for share + insert) and (select for update + update)

我有以下 table(所有示例都带有 psycopg2 python 库):

CREATE TABLE IF NOT EXISTS TeamTasks
        (
            id              SERIAL PRIMARY KEY,
            round           INTEGER,
            task_id         INTEGER,
            team_id         INTEGER,
            score           FLOAT         DEFAULT 0,
            UNIQUE (round, task_id, team_id)
        )

我有两个功能:

  1. 将上一轮的TeamTasks复制到新一轮,SELECTINSERT之间没有更新,函数实现如下:
query = """
    WITH prev_table AS (
        SELECT score FROM teamtasks 
        WHERE task_id = %(task_id)s AND team_id = %(team_id)s AND round <= %(round)s - 1
        ORDER BY round DESC LIMIT 1 
        FOR SHARE
    )
    INSERT INTO TeamTasks (task_id, team_id, round, score) 
    SELECT %(task_id)s, %(team_id)s, %(round)s, score
    FROM prev_table;
"""

with aux.db_cursor() as (conn, curs):    
    for team_id in range(team_count):
        for task_id in range(task_count): 
            curs.execute(
                query, 
                {
                    'task_id': task_id,
                    'team_id': team_id,
                    'round': cur_round + 1,
                },
            )
    conn.commit()

aux.db_cursor 只是获取 psycopg2 连接和游标的便捷包装器。

  1. 更新 TeamTasks 中针对特定团队和特定任务以及多轮的行。它是这样实现的:
# I have team1_id, team2_id and task_id

query1 = "SELECT score from teamtasks WHERE team_id=%s AND task_id=%s AND round=%s FOR NO KEY UPDATE"

query2 = "UPDATE teamtasks SET score = %s WHERE team_id=%s AND task_id=%s AND round >= %s"

with aux.db_cursor() as (conn, curs):
    curs.execute(query1, (team1_id, task_id, cur_round))
    score1, = curs.fetchone()

    curs.execute(query1, (team2_id, task_id, cur_round))
    score2, = curs.fetchone()

    sleep(0.1)  # Here happens something time-consuming

    curs.execute(query2, (score1 + 0.1, team1_id, task_id, cur_round))
    curs.execute(query2, (score2 - 0.1, team2_id, task_id, cur_round))

    conn.commit()

我可以保证在第二个函数中每个团队只能是一个更新的主题,因此所有并发更新的团队总是不同的。

此外,第一个函数 运行 很少,除了这两个函数没有其他人更新这个 table,所以第一个函数中的锁正是 table 在 TeamTasks 复制期间不会更改。

在上述环境中,我遇到了很多死锁,如下所示:

postgres_1  | 2019-11-17 20:43:08.510 UTC [49] ERROR:  deadlock detected
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] DETAIL:  Process 49 waits for ShareLock on transaction 685; blocked by process 65.
postgres_1  |   Process 65 waits for ShareLock on transaction 658; blocked by process 49.
postgres_1  |   Process 49:
postgres_1  |           WITH prev_table AS (
postgres_1  |               SELECT score FROM teamtasks
postgres_1  |               WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1  |               ORDER BY round DESC LIMIT 1
postgres_1  |               FOR SHARE
postgres_1  |           )
postgres_1  |           INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1  |           SELECT 8, 6, 1, score
postgres_1  |           FROM prev_table;
postgres_1  |
postgres_1  |   Process 65: SELECT score from teamtasks WHERE team_id=0 AND task_id=8 AND round=0 FOR NO KEY UPDATE
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] HINT:  See server log for query details.
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] CONTEXT:  while locking tuple (0,69) in relation "teamtasks"
postgres_1  | 2019-11-17 20:43:08.510 UTC [49] STATEMENT:
postgres_1  |           WITH prev_table AS (
postgres_1  |               SELECT score FROM teamtasks
postgres_1  |               WHERE task_id = 8 AND team_id = 6 AND round <= 1 - 1
postgres_1  |               ORDER BY round DESC LIMIT 1
postgres_1  |               FOR SHARE
postgres_1  |           )
postgres_1  |           INSERT INTO TeamTasks (task_id, team_id, round, score)
postgres_1  |           SELECT 8, 6, 1, score
postgres_1  |           FROM prev_table;

如何解决这些死锁?有没有我没有看到的巧妙解决方案?

select for share 在这里似乎没有必要。该语法用于保留参照完整性。在您的情况下,您正在从相同的 teamtasks table 中进行选择和插入,因此您不必要地持有 table 上的锁,这会导致您的两个连接相互阻塞(并且它会最终最好重构代码,这样你只使用一个连接,但我不知道这对你来说有多可行)。据我所知,select for share 语法更多地与其他 table 的更新和引用完整性有关,而不是与相同 table.[=24= 的插入有关]

问题在于,在第一个 aux_db_cursor() 调用中,当您循环遍历 range(team_count)range(task_count) —— 然后在第二个 aux_db_cursor() 调用中,你在对某些行执行 UPDATE 之前执行一项耗时的任务——那些 UPDATE 锁请求将与那些 FOR SHARE 把锁。我会摆脱 FOR SHARE 锁,除非你真的需要它们(在这一点上,如果可能的话,我会想方设法将它全部整合到一个数据库连接中)。

披露:我为 EnterpriseDB (EDB)

工作