涉及 SELECT FOR UPDATE 的死锁

Deadlock involving SELECT FOR UPDATE

我有几个查询的事务。首先,select 行带有 FOR UPDATE 锁:

SELECT f.source_id FROM files AS f WHERE
    f.component_id =  AND
    f.archived_at IS NULL
FOR UPDATE

接下来是更新查询:

UPDATE files AS f SET archived_at = NOW()
WHERE
hw_component_id =  AND
f.source_id = ANY(::text[])

然后有一个插入:

INSERT INTO files AS f (
    source_id,
    ...
)
VALUES (..)
ON CONFLICT (component_id, source_id) DO UPDATE
SET archived_at = null,
is_valid = excluded.is_valid

我有两个应用程序实例,有时我会在 PostgreSQL 日志中看到死锁错误:

ERROR:  deadlock detected
DETAIL:  Process 3992939 waits for ShareLock on transaction 230221362; blocked by process 4108096.
Process 4108096 waits for ShareLock on transaction 230221365; blocked by process 3992939.
Process 3992939: SELECT f.source_id FROM files AS f WHERE f.component_id =  AND f.archived_at IS NULL FOR UPDATE
Process 4108096: INSERT INTO files AS f (source_id, ...) VALUES (..) ON CONFLICT (component_id, source_id) DO UPDATE SET archived_at = null, is_valid = excluded.is_valid
CONTEXT:  while locking tuple (41116,185) in relation \"files\"

我假设它可能是由 ON CONFLICT DO UPDATE 语句引起的,它可能更新未被先前 SELECT FOR UPDATE

锁定的行

但我不明白 SELECT ... FOR UPDATE 查询如果是事务中的第一个查询怎么会导致死锁。之前没有查询。 SELECT ... FOR UPDATE 语句是否可以锁定几行然后等待条件中的其他行被解锁?

SELECT FOR UPDATE 不能防止死锁。它只是锁定行。沿途获取锁,按照 ORDER BY 指示的顺序,或者在没有 ORDER BY 的情况下以任意顺序获取。防止死锁的最佳方法是在整个事务中以一致的顺序锁定行——在所有并发事务中也这样做。或者,如 the manual puts it:

The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order.

否则,这可能会发生(row1row2、...是根据虚拟一致顺序编号的行):

T1: SELECT FOR UPDATE ...          -- lock row2, row3
        T2: SELECT FOR UPDATE ...  -- lock row4, wait for T1 to release row2 
T1: INSERT ... ON CONFLICT ...     -- wait for T2 to release lock on row4

--> deadlock

ORDER BY 添加到您的 SELECT... FOR UPDATE 可能 已经避免了死锁。 (它会避免上面演示的那个。)或者发生这种情况,你必须做更多:

T1: SELECT FOR UPDATE ...          -- lock row2, row3
        T2: SELECT FOR UPDATE ...  -- lock row1, wait for T1 to release row2 
T1: INSERT ... ON CONFLICT ...     -- wait for T2 to release lock on row1

--> deadlock

交易中的一切都必须以一致的顺序发生才能绝对确定。

还有,你的UPDATE好像不符合SELECT FOR UPDATEcomponent_id <> hw_component_id。打字错误?
此外,f.archived_at IS NULL 不保证后面的 SET archived_at = NOW() 只影响这些行。您必须将 WHERE f.archived_at IS NULL 添加到 UPDATE 行中。 (无论如何看起来都是个好主意?)

I assume that it may be caused by ON CONFLICT DO UPDATE statement, which may update rows which are not locked by previous SELECT FOR UPDATE.

只要 UPSERT (ON CONFLICT DO UPDATE) 坚持一致的顺序,那将不是问题。但这可能很难或不可能执行。

Can SELECT ... FOR UPDATE statement lock several rows and then wait for other rows in condition to be unlocked?

是的,如上所述,锁是一路获取的。它可能不得不在中途停下来等待。

NOWAIT

如果所有这些仍然不能解决你的死锁,缓慢而可靠的方法是使用Serializable Isolation Level。那么您必须为序列化失败做好准备,并在这种情况下重试事务。整体上要贵得多。

或者添加 NOWAIT:

可能就足够了
SELECT FROM files
WHERE  component_id = 
AND    archived_at IS NULL
ORDER  BY id   -- whatever you use for consistent, deterministic order
FOR    UPDATE NOWAIT;

The manual:

With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately.

如果您无论如何都无法与 UPSERT 建立一致的顺序,您甚至可以跳过带有 NOWAITORDER BY 子句。

然后您必须捕获该错误并重试交易。类似于捕获序列化失败,但更便宜 - 并且不太可靠。例如,多个事务仍然可以单独与它们的 UPSERT 互锁。但这种可能性越来越小。