insert/select 上的死锁

Deadlock on insert/select

好的,我完全迷失在死锁问题上。我只是不知道如何解决这个问题。

我有这三个表(我删除了不重要的列):

CREATE TABLE [dbo].[ManageServicesRequest]
(
    [ReferenceTransactionId]    INT                 NOT NULL,
    [OrderDate]                 DATETIMEOFFSET(7)   NOT NULL,
    [QueuePriority]             INT                 NOT NULL,
    [Queued]                    DATETIMEOFFSET(7)   NULL,
    CONSTRAINT [PK_ManageServicesRequest] PRIMARY KEY CLUSTERED ([ReferenceTransactionId]),
)

CREATE TABLE [dbo].[ServiceChange]
(
    [ReferenceTransactionId]    INT                 NOT NULL,
    [ServiceId]                 VARCHAR(50)         NOT NULL,
    [ServiceStatus]             CHAR(1)             NOT NULL,
    [ValidFrom]                 DATETIMEOFFSET(7)   NOT NULL,
    CONSTRAINT [PK_ServiceChange] PRIMARY KEY CLUSTERED ([ReferenceTransactionId],[ServiceId]),
    CONSTRAINT [FK_ServiceChange_ManageServiceRequest] FOREIGN KEY ([ReferenceTransactionId]) REFERENCES [ManageServicesRequest]([ReferenceTransactionId]) ON DELETE CASCADE,
    INDEX [IDX_ServiceChange_ManageServiceRequestId] ([ReferenceTransactionId]),
    INDEX [IDX_ServiceChange_ServiceId] ([ServiceId])
)

CREATE TABLE [dbo].[ServiceChangeParameter]
(
    [ReferenceTransactionId]    INT                 NOT NULL,
    [ServiceId]                 VARCHAR(50)         NOT NULL,
    [ParamCode]                 VARCHAR(50)         NOT NULL,
    [ParamValue]                VARCHAR(50)         NOT NULL,
    [ParamValidFrom]            DATETIMEOFFSET(7)   NOT NULL,
    CONSTRAINT [PK_ServiceChangeParameter] PRIMARY KEY CLUSTERED ([ReferenceTransactionId],[ServiceId],[ParamCode]),
    CONSTRAINT [FK_ServiceChangeParameter_ServiceChange] FOREIGN KEY ([ReferenceTransactionId],[ServiceId]) REFERENCES [ServiceChange] ([ReferenceTransactionId],[ServiceId]) ON DELETE CASCADE,
    INDEX [IDX_ServiceChangeParameter_ManageServiceRequestId] ([ReferenceTransactionId]),
    INDEX [IDX_ServiceChangeParameter_ServiceId] ([ServiceId]),
    INDEX [IDX_ServiceChangeParameter_ParamCode] ([ParamCode])
)

还有这两个程序:

CREATE PROCEDURE [dbo].[spCreateManageServicesRequest]
    @ReferenceTransactionId INT,
    @OrderDate DATETIMEOFFSET,
    @QueuePriority INT,
    @Services ServiceChangeUdt READONLY,
    @Parameters ServiceChangeParameterUdt READONLY
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRY
    /* VYTVOŘ NOVÝ REQUEST NA ZMĚNU SLUŽEB */

        /*  INSERT REQUEST  */
        INSERT INTO [dbo].[ManageServicesRequest]
            ([ReferenceTransactionId]
            ,[OrderDate]
            ,[QueuePriority]
            ,[Queued])
        VALUES
            (@ReferenceTransactionId
            ,@OrderDate
            ,@QueuePriority
            ,NULL)

        /*  INSERT SERVICES */
        INSERT INTO [dbo].[ServiceChange]
            ([ReferenceTransactionId]
            ,[ServiceId]
            ,[ServiceStatus]
            ,[ValidFrom])
        SELECT 
             @ReferenceTransactionId AS [ReferenceTransactionId]
            ,[ServiceId]
            ,[ServiceStatus]
            ,[ValidFrom]
        FROM @Services AS [S]

        /*  INSERT PARAMS   */
        INSERT INTO [dbo].[ServiceChangeParameter]
            ([ReferenceTransactionId]
            ,[ServiceId]
            ,[ParamCode]
            ,[ParamValue]
            ,[ParamValidFrom])
        SELECT 
            @ReferenceTransactionId AS [ReferenceTransactionId]
            ,[ServiceId]
            ,[ParamCode]
            ,[ParamValue]
            ,[ParamValidFrom]
        FROM @Parameters AS [P]

    END TRY
    BEGIN CATCH
        THROW
    END CATCH
END

CREATE PROCEDURE [dbo].[spGetManageServicesRequest]
    @ReferenceTransactionId INT
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRY 
        /* VRAŤ MANAGE SERVICES REQUEST PODLE ID */

        SELECT 
            [MR].[ReferenceTransactionId], 
            [MR].[OrderDate], 
            [MR].[QueuePriority], 
            [MR].[Queued], 
            
            [SC].[ReferenceTransactionId], 
            [SC].[ServiceId], 
            [SC].[ServiceStatus], 
            [SC].[ValidFrom],
            
            [SP].[ReferenceTransactionId], 
            [SP].[ServiceId], 
            [SP].[ParamCode], 
            [SP].[ParamValue], 
            [SP].[ParamValidFrom]

        FROM [dbo].[ManageServicesRequest] AS [MR]
        LEFT JOIN [dbo].[ServiceChange] AS [SC] ON [SC].[ReferenceTransactionId] = [MR].[ReferenceTransactionId]
        LEFT JOIN [dbo].[ServiceChangeParameter] AS [SP] ON [SP].[ReferenceTransactionId] = [SC].[ReferenceTransactionId] AND [SP].[ServiceId] = [SC].[ServiceId]
        WHERE [MR].[ReferenceTransactionId] = @ReferenceTransactionId

    END TRY
    BEGIN CATCH
        THROW
    END CATCH
END

现在这些是这样使用的(这是一种简化的 C# 方法,它创建一条记录,然后将记录发布到微服务队列):

public async Task Consume(ConsumeContext<CreateCommand> context)
{
    using (var sql = sqlFactory.Cip)
    {
        /*SAVE REQUEST TO DATABASE*/
        sql.StartTransaction(System.Data.IsolationLevel.Serializable); <----- First transaction starts

        /* Create id */
        var transactionId = await GetNewId(context.Message.CorrelationId);

        /* Create manage services request */
        await sql.OrderingGateway.ManageServices.Create(transactionId,  context.Message.ApiRequest.OrderDate, context.Message.ApiRequest.Priority, services);

        sql.Commit(); <----- First transaction ends
        

        /// .... Some other stuff ...

        /* Fetch the same object you created in the first transaction */
        Try
        {
            sql.StartTransaction(System.Data.IsolationLevel.Serializable);
            
            var request = await sql.OrderingGateway.ManageServices.Get(transactionId); <----- HERE BE THE DEADLOCK, 

            request.Queued = DateTimeOffset.Now;
            await sql.OrderingGateway.ManageServices.Update(request);

            ... Here is a posting to a microservice queue ...
        
            sql.Commit();
        }
        catch (Exception)
        {
            sql.RollBack();
        }
        
        /// .... Some other stuff ....
}

现在我的问题是。为什么这两个程序会陷入僵局?对于同一条记录,第一笔和第二笔交易绝不会 运行 并行。

这里是死锁的详细信息:

<deadlock>
  <victim-list>
    <victimProcess id="process1dbfa86c4e8" />
  </victim-list>
  <process-list>
    <process id="process1dbfa86c4e8" taskpriority="0" logused="0" waitresource="KEY: 18:72057594046775296 (b42d8e559092)" waittime="2503" ownerId="33411557480" transactionname="user_transaction" lasttranstarted="2021-12-01T01:06:15.303" XDES="0x1ddd2df4420" lockMode="RangeS-S" schedulerid="20" kpid="23000" status="suspended" spid="55" sbid="2" ecid="0" priority="0" trancount="1" lastbatchstarted="2021-12-01T01:06:15.310" lastbatchcompleted="2021-12-01T01:06:15.300" lastattention="1900-01-01T00:00:00.300" clientapp="Core Microsoft SqlClient Data Provider" hostpid="11020" isolationlevel="serializable (4)" xactid="33411557480" currentdb="18" currentdbname="xxx" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
      <executionStack>
        <frame procname="xxx.dbo.spGetManageServicesRequest" line="10" stmtstart="356" stmtend="4256" sqlhandle="0x030012001374fc02f91433019aad000001000000000000000000000000000000000000000000000000000000"></frame>
      </executionStack>
    </process>
    <process id="process1dbfa1c1c28" taskpriority="0" logused="1232" waitresource="KEY: 18:72057594046971904 (ffffffffffff)" waittime="6275" ownerId="33411563398" transactionname="user_transaction" lasttranstarted="2021-12-01T01:06:16.450" XDES="0x3d4e842c420" lockMode="RangeI-N" schedulerid="31" kpid="36432" status="suspended" spid="419" sbid="2" ecid="0" priority="0" trancount="2" lastbatchstarted="2021-12-01T01:06:16.480" lastbatchcompleted="2021-12-01T01:06:16.463" lastattention="1900-01-01T00:00:00.463" clientapp="Core Microsoft SqlClient Data Provider"  hostpid="11020" isolationlevel="serializable (4)" xactid="33411563398" currentdb="18" currentdbname="xxx" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
      <executionStack>
        <frame procname="xxx.dbo.spCreateManageServicesRequest" line="40" stmtstart="2592" stmtend="3226" sqlhandle="0x03001200f01ab84aeb1433019aad000001000000000000000000000000000000000000000000000000000000"></frame>
      </executionStack>
    </process>
  </process-list>
  <resource-list>
    <keylock hobtid="72057594046775296" dbid="18" objectname="xxx.dbo.ServiceChange" indexname="PK_ServiceChange" id="lock202ecfd0380" mode="X" associatedObjectId="72057594046775296">
      <owner-list>
        <owner id="process1dbfa1c1c28" mode="X" />
      </owner-list>
      <waiter-list>
        <waiter id="process1dbfa86c4e8" mode="RangeS-S" requestType="wait" />
      </waiter-list>
    </keylock>
    <keylock hobtid="72057594046971904" dbid="18" objectname="xxx.dbo.ServiceChangeParameter" indexname="PK_ServiceChangeParameter" id="lock27d3d371880" mode="RangeS-S" associatedObjectId="72057594046971904">
      <owner-list>
        <owner id="process1dbfa86c4e8" mode="RangeS-S" />
      </owner-list>
      <waiter-list>
        <waiter id="process1dbfa1c1c28" mode="RangeI-N" requestType="wait" />
      </waiter-list>
    </keylock>
  </resource-list>
</deadlock>

为什么会出现这种僵局?以后如何避免?

编辑: 这是 Get 过程的计划:https://www.brentozar.com/pastetheplan/?id=B1UMMhaqF

另一个编辑: 在 GSerg 发表评论后,由于删除了对问题不重要的列,我将死锁图中的行号从 65 更改为 40。

您最好避免使用可序列化隔离级别。提供可序列化保证的方式通常容易发生死锁。

如果你不能改变你的存储过程来使用更有针对性的锁定提示来保证你在较低的隔离级别上需要的结果,那么你可以通过确保在 ServiceChange 首先在 ServiceChangeParameter.

上取出任何东西之前

这样做的一种方法是在 spGetManageServicesRequest 中引入一个 table 变量并具体化

的结果
SELECT ...
FROM [dbo].[ManageServicesRequest] AS [MR]
  LEFT JOIN [dbo].[ServiceChange] AS [SC]  ON [SC].[ReferenceTransactionId] = [MR].[ReferenceTransactionId]

到table变量。

然后将其加入 [dbo].[ServiceChangeParameter] 以获得最终结果。

table 变量引入的相分离将确保 SELECT 语句以与插入操作相同的对象顺序获取锁,从而防止 SELECT 语句出现死锁已经持有 ServiceChangeParameter 上的锁并正在等待获取 ServiceChange 上的锁(如此处的死锁图所示)。

查看 SELECT 运行 在可序列化隔离级别取出的确切锁可能会有所帮助。这些可以通过扩展事件或未记录的跟踪标志 1200 看到。

目前您的执行计划如下。

对于下面的示例数据

INSERT INTO [dbo].[ManageServicesRequest] 
VALUES (26410821, GETDATE(), 1, GETDATE()), 
       (26410822, GETDATE(), 1, GETDATE()), 
       (26410823, GETDATE(), 1, GETDATE());

INSERT INTO [dbo].[ServiceChange] 
VALUES (26410821, 'X', 'X', GETDATE()), 
       (26410822, 'X', 'X', GETDATE()), 
       (26410823, 'X', 'X', GETDATE());

INSERT INTO [dbo].[ServiceChangeParameter]  
VALUES (26410821, 'X', 'P1','P1', GETDATE()), 
       (26410823, 'X', 'P1','P1', GETDATE());

跟踪标志输出(WHERE [MR].[ReferenceTransactionId] = 26410822)是

Process 51 acquiring IS lock on OBJECT: 7:1557580587:0  (class bit2000000 ref1) result: OK

Process 51 acquiring IS lock on OBJECT: 7:1509580416:0  (class bit2000000 ref1) result: OK

Process 51 acquiring IS lock on OBJECT: 7:1477580302:0  (class bit2000000 ref1) result: OK

Process 51 acquiring IS lock on PAGE: 7:1:600  (class bit2000000 ref0) result: OK

Process 51 acquiring S lock on KEY: 7:72057594044940288 (1b148afa48fb) (class bit2000000 ref0) result: OK

Process 51 acquiring IS lock on PAGE: 7:1:608  (class bit2000000 ref0) result: OK

Process 51 acquiring RangeS-S lock on KEY: 7:72057594045005824 (a69d56b089b6) (class bit2000000 ref0) result: OK

Process 51 acquiring IS lock on PAGE: 7:1:632  (class bit2000000 ref0) result: OK

Process 51 acquiring RangeS-S lock on KEY: 7:72057594045202432 (c37d1982c3c9) (class bit2000000 ref0) result: OK

Process 51 acquiring RangeS-S lock on KEY: 7:72057594045005824 (2ef5265f2b42) (class bit2000000 ref0) result: OK

锁定顺序如下图所示。范围锁适用于从给定键值到其下方最近的键值的可能值范围(按键顺序 - 因此在图像中位于其上方!)。

首先调用节点 1 并对 ManageServicesRequest 中的行进行 S 锁定,然后调用节点 2 并对 RangeS-S 中的键进行锁定=14=] 该行的值然后用于在 ServiceChangeParameter 中进行查找 - 在这种情况下,谓词没有匹配的行,但仍会取出 RangeS-S 锁,覆盖范围从前一个键的下一个最高键(在这种情况下范围 (26410821, 'X', 'P1') ... (26410823, 'X', 'P1'))。

然后再次调用节点2,看是否还有行。即使在 ServiceChange.

的下一行没有额外的 RangeS-S 锁的情况下

在你的死锁图的情况下,锁定在 ServiceChangeParameter 中的范围似乎是无穷大的范围(由 ffffffffffff 表示) - 当它查找时会发生在这里对于等于或超过索引中最后一个键的键值。

table 变量的替代方法也可能是按如下方式更改查询。

SELECT ...
FROM [dbo].[ManageServicesRequest] AS [MR]
  LEFT JOIN [dbo].[ServiceChange] AS [SC]  ON [SC].[ReferenceTransactionId] = [MR].[ReferenceTransactionId]
  LEFT HASH JOIN [dbo].[ServiceChangeParameter] AS [SP] ON [SP].[ReferenceTransactionId] = [MR].[ReferenceTransactionId] AND [SP].[ServiceId] = [SC].[ServiceId]
  WHERE [MR].[ReferenceTransactionId] = @ReferenceTransactionId

[dbo] 上的最终谓词。[ServiceChangeParameter] 更改为引用 [MR].[ReferenceTransactionId] 而不是 [SC].[ReferenceTransactionId],并添加了显式散列连接提示。

这给出了如下所示的计划,其中 ServiceChange 上的所有锁都在散列 table 构建阶段获取,然后再在 ServiceChangeParameter 上获取 - 不更改 ReferenceTransactionId 条件是新计划在 ServiceChangeParameter 上进行扫描而不是查找,这就是进行更改的原因(它允许优化器在 @ReferenceTransactionId 上使用隐含的相等谓词)