在不锁定 table 的情况下插入大量记录

Question

我正在尝试将 1,500,000 条记录插入 table。在插入过程中面临 table 锁问题。所以我想出了下面的批量插入。

DECLARE @BatchSize INT = 50000

WHILE 1 = 1
  BEGIN
      INSERT INTO [dbo].[Destination] 
                  (proj_details_sid,
                   period_sid,
                   sales,
                   units)
      SELECT TOP(@BatchSize) s.proj_details_sid,
                             s.period_sid,
                             s.sales,
                             s.units
      FROM   [dbo].[SOURCE] s
      WHERE  NOT EXISTS (SELECT 1
                         FROM   dbo.Destination d
                         WHERE  d.proj_details_sid = s.proj_details_sid
                                AND d.period_sid = s.period_sid)

      IF @@ROWCOUNT < @BatchSize
        BREAK
  END

我在 Destination table (proj_details_sid ,period_sid ) 上有一个聚簇索引。 NOT EXISTS 部分只是为了限制插入的记录再次插入 table

我这样做对吗，这会避免 table 锁定吗？或者有什么更好的方法。

注意：批处理和无批处理插入所花费的时间大致相同

Answer 1

我添加了 (NOLOCK) 你的目的地 table -> dbo.Destination(NOLOCK)。现在，您不会锁定 table.

WHILE 1 = 1
  BEGIN
      INSERT INTO [dbo].[Destination] 
                  (proj_details_sid,
                   period_sid,
                   sales,
                   units)
      SELECT TOP(@BatchSize) s.proj_details_sid,
                             s.period_sid,
                             s.sales,
                             s.units
      FROM   [dbo].[SOURCE] s
      WHERE  NOT EXISTS (SELECT 1
                         FROM   dbo.Destination(NOLOCK) d
                         WHERE  d.proj_details_sid = s.proj_details_sid
                                AND d.period_sid = s.period_sid)

      IF @@ROWCOUNT < @BatchSize
        BREAK
  END

Answer 2

为此，您可以在 select 语句中使用 WITH (NOLOCK)。但不建议在 OLTP 数据库上使用 NOLOCK。

Answer 3

与其检查 Destination 中是否存在数据，不如先将所有数据存储在临时 table 中，然后批量插入到 Destination

中

参考：Using ROWLOCK in an INSERT statement (SQL Server)

DECLARE @batch int = 100
DECLARE @curRecord int = 1
DECLARE @maxRecord int

-- remove (nolock) if you don't want to have dirty read
SELECT row_number over (order by s.proj_details_sid, s.period_sid) as rownum,
       s.proj_details_sid,
       s.period_sid,
       s.sales,
       s.units
INTO #Temp
FROM   [dbo].[SOURCE] s WITH (NOLOCK)
WHERE  NOT EXISTS (SELECT 1
                   FROM   dbo.Destination d WITH (NOLOCK)
                   WHERE  d.proj_details_sid = s.proj_details_sid
                          AND d.period_sid = s.period_sid)

-- change this maxRecord if you want to limit the records to insert
SELECT @maxRecord = count(1) from #Temp

WHILE @maxRecord >= @curRecord
   BEGIN
       INSERT INTO [dbo].[Destination] 
              (proj_details_sid,
               period_sid,
               sales,
               units)
       SELECT proj_details_sid, period_sid, sales, units
       FROM #Temp
       WHERE rownum >= @curRecord and rownum < @curRecord + @batch

       SET @curRecord = @curRecord + @batch
   END

DROP TABLE #Temp

Answer 4

锁定升级不太可能与您语句的 SELECT 部分完全相关。

这是一个natural consequence of inserting a large number of rows

Lock escalation is triggered when lock escalation is not disabled on the table by using the ALTER TABLE SET LOCK_ESCALATION option, and when either of the following conditions exists:

A single Transact-SQL statement acquires at least 5,000 locks on a single nonpartitioned table or index.

A single Transact-SQL statement acquires at least 5,000 locks on a single partition of a partitioned table and the ALTER TABLE SET LOCK_ESCALATION option is set to AUTO.

The number of locks in an instance of the Database Engine exceeds memory or configuration thresholds.

If locks cannot be escalated because of lock conflicts, the Database Engine periodically triggers lock escalation at every 1,250 new locks acquired.

您可以通过在 Profiler 中跟踪锁定升级事件或简单地尝试使用不同的批处理大小来轻松地亲眼看到这一点。对我来说，TOP (6228) 显示持有 6250 个锁，但 TOP (6229) 随着锁升级的开始，它突然下降到 1。确切的数字可能会有所不同（取决于数据库设置和当前可用的资源）。使用反复试验找到出现锁升级的阈值。

CREATE TABLE [dbo].[Destination]
  (
     proj_details_sid INT,
     period_sid       INT,
     sales            INT,
     units            INT
  )

BEGIN TRAN --So locks are held for us to count in the next statement
INSERT INTO [dbo].[Destination]
SELECT TOP (6229) 1,
                  1,
                  1,
                  1
FROM   master..spt_values v1,
       master..spt_values v2

SELECT COUNT(*)
FROM   sys.dm_tran_locks
WHERE  request_session_id = @@SPID;

COMMIT

DROP TABLE [dbo].[Destination]

您正在插入 50,000 行，因此几乎可以肯定会尝试锁定升级。

这篇文章How to resolve blocking problems that are caused by lock escalation in SQL Server已经很老了，但很多建议仍然有效。

将大批量操作分解为几个较小的操作（即使用较小的批量大小）
如果不同的 SPID 当前持有不兼容的 table 锁，则不会发生锁升级 - 他们给出的示例是不同的会话执行

BEGIN TRAN
SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1=0
WAITFOR DELAY '1:00:00'
COMMIT TRAN

通过启用跟踪标志 1211 来禁用锁定升级 - 但是，这是一个全局设置，可能会导致严重的问题。有一个较新的选项 1224 问题较少，但这仍然是全局的。

另一种选择是 ALTER TABLE blah SET (LOCK_ESCALATION = DISABLE)，但这仍然不是很有针对性，因为它会影响针对 table 的所有查询，而不仅仅是这里的单个场景。

所以我会选择选项 1 或可能的选项 2 并打折其他选项。

在不锁定 table 的情况下插入大量记录

Inserting large number of records without locking the table

sql

tsql

sql-server

batch-insert

sql-insert