SQL查询-长运行/占用CPU资源

SQL Query - long running / taking up CPU resource

你好,我有下面的 SQL 查询,它平均需要 40 分钟才能到达 运行,它引用的其中一个表中有超过 700 万条记录。

我 运行 通过数据库调优顾问获得了这个并应用了所有建议,我还在 sql 的 activity 监视器中对其进行了评估并且没有推荐进一步的索引等.

任何建议都很好,提前致谢

WITH CTE AS 
(
    SELECT r.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    r.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM Result R
    INNER JOIN Job J ON R.JobId = J.Id
    INNER JOIN User C ON J.UserId = C.Id
    WHERE 
    ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)
    AND ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT)
    AND ISNULL(R.Email,'') <> '' -- has an email address
    AND ISNULL(R.EmailSent, CAST(0 AS BIT)) = CAST(0 AS BIT) -- email has not been sent
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND ISNULL(R.EmailStatus,'') = '' -- email has not been sent
    AND ISNULL(R.IsEmailSubscribe, 'True') <> 'False' -- not unsubscribed
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE  ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- list of ids
    )
    AND J.Id NOT IN (
        -- list of ids
    )
    AND J.ClientId NOT IN 
    (
        -- list of ids
    )
)
INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, ConsultantId, ConsultantEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1

请在下面查看我更新的查询:

WITH CTE AS 
(
    SELECT R.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    R.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM RESULTS R
    INNER JOIN JOB J ON R.JobId = J.Id
    INNER JOIN Consultant C ON J.UserId = C.Id
    WHERE 
    J.DCApproved = 1
    AND (J.Closed = 0 OR J.Closed IS NULL)
    AND (R.Email <> '' OR R.Email IS NOT NULL)
    AND (R.EmailSent = 0 OR R.EmailSent IS NULL)
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND (R.EmailStatus = '' OR R.EmailStatus IS NULL)
    AND (R.IsEmailSubscribe = 'True' OR R.IsEmailSubscribe IS NULL)
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE (u.EmailAddress = R.Email OR (u.EmailAddress IS NULL AND R.Email IS NULL))

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.ClientId NOT IN 
    (
        -- LIST OF IDS
    )
)

INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, UserId, UserEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1


GO

一般来说,700 万条记录对于现代数据库来说简直就是个笑话。如果你提出问题,你应该讨论数十亿行的问题,而不是 700 万行。

表示查询有问题。高 CPU 通常是不匹配字段的标志(将一个 table 中的字符串与另一个 table 中的数字进行比较)或...过于频繁调用的函数。 Long 运行 通常是缺少索引或...的标志。你真的很努力。

Non-Sargeability 意味着不能使用索引。这就是所有的例子:

ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)

ISNULL(field, value) 表示字段上的索引不可用 - 基本上 "goodby index, hello table scan"。这也意味着 - 嗯....

(J.Approoved = 1 或 J.Approoved 为空)

同义,但它是sargeable。您的几乎所有条件都是以不可搜索的方式编写的 - 欢迎来到 db hell。开始重写。

您可能想在 https://www.techopedia.com/definition/28838/sargeable

上阅读更多关于 sargeability 的内容

还要确保在所有相关外键(和引用的主键)上都有索引 - 否则,再次欢迎 table 扫描。

WHEREJOIN 子句中使用 ISNULL 可能是这里的主要原因。针对查询中的列使用函数会导致查询变得不可 SARGable(这意味着它不能使用 table(s) 上的任何索引,因此它会扫描整个事物)。 注意;使用针对变量的函数,其中 WHERE 通常 很好。例如 WHERE SomeColumn = DATEADD(DAY, @n, @SomeDate)。像 WHERE SomeColumn = ISNULL(@Variable,0) 这样的东西有 "catch-all query" 的味道,所以可以成为性能杀手;取决于你的设置。这不是手头的讨论。

对于像 ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT) 这样的子句,这让查询优化器非常头疼,并且您的查询充满了它们。您需要将它们替换为以下子句:

WHERE (J.Closed = 0 OR J.Closed IS NULL)

虽然没有什么区别,但也没有必要 CAST 0。 SQL 服务器可以看到您正在与 bit 进行比较,因此也会将 0 解释为一个。

您还有一个带有 WHERE 子句 ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')EXISTS。这需要变成:

WHERE (u.EmailAddress = R.Email
  OR   (u.EmailAddress IS NULL AND R.Email IS NULL))

您需要在 WHERE 子句(CTE 和子查询)中更改 allISNULL 用法,您应该会看到体面的性能提升。