为什么可重复读会出现写偏斜?

Why write skew can happen in Repeatable reads?

Wiki表示;

Repeatable read:
In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so phantom reads can occur.

Write skew is possible at this isolation level, a phenomenon where two writes are allowed to the same column(s) in a table by two different writers (who have previously read the columns they are updating), resulting in the column having data that is a mix of the two transactions.

我很好奇为什么 write skew 会发生在 Repeatable reads? 它说它将保持读写锁直到事务结束并且write skew发生在previously read the columns they are updating时,那么如何在锁定读锁时锁定写锁?​​

可重复读隔离级别保证每个事务都会从数据库的consistent snapshot读取。换句话说,同一行在同一事务中被检索两次始终具有相同的值。

许多数据库,例如 Postgres,SQL可重复读取隔离级别的服务器可以检测到 lost update(写倾斜的一种特殊情况),但其他数据库则不能。 (即:MySQL 中的 InnoDB 引擎)

我们回来写歪斜现象问题了。在可重复读隔离中存在大多数数据库引擎无法检测到的情况。一种情况是 2 个并发事务 修改 2 个不同的对象 并形成竞争条件。

我举书上的例子Designing Data-Intensive Application。场景如下:

You are writing an application for doctors to manage their on-call shifts at a hospital. The hospital usually tries to have several doctors on call at any one time, but it absolutely must have at least one doctor on call. Doctors can give up their shifts (e.g., if they are sick themselves), provided that at least one colleague remains on call in that shift

下一个有趣的问题是我们如何在数据库下实现它。这是伪代码 SQL 代码:

BEGIN TRANSACTION;
    SELECT * FROM doctors
        WHERE on_call = true
        AND shift_id = 1234;
    if (current_on_call >= 2) {
        UPDATE doctors
        SET on_call = false WHERE name = 'Alice' AND shift_id = 1234;
    }
COMMIT;  

插图如下:

如上图所示,我们看到 Bob 和 Alice 运行 以上 SQL 代码并发。但是Bob和Alice修改的数据不同,Bob修改了Bob的记录,Alice修改了Alice的记录。可重复读取隔离级别的数据库无法知道和检查条件(总医生> = 2)已被违反。发生了写偏现象。

为了解决这个问题,提出了2种方法:

  1. 锁定所有手动调用的记录。因此 Bob 或 Alice 将等待其他人完成交易。

这是一些使用 SELECT .. FOR UPDATE 查询的伪代码。

BEGIN TRANSACTION;
    SELECT * FROM doctors
        WHERE on_call = true
        AND shift_id = 1234 FOR UPDATE; // important here: locks all records that satisfied requirements.

    if (current_on_call >= 2) {
        UPDATE doctors
        SET on_call = false WHERE name = 'Alice' AND shift_id = 1234;
    }
  COMMIT;  
  1. 使用更严格的隔离级别。 MySQL, Postgres T-SQL 都提供序列化隔离级别。