为什么这两个 SQL 查询的效率差异如此之大?
Why are these two SQL queries so different in efficiency?
我必须使用 SQL 进行实习,虽然我知道它的要点,但我并没有真正的编程背景,也不知道是什么让代码变得高效等等。
查询 #1
SELECT DISTINCT
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
FROM
(SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM
TABLE) AS b
) AS c
LEFT JOIN
(SELECT
*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS d
FROM
TABLE) AS e
) AS f ON c.[ID] = f.[ID] AND a = d - 1
ORDER BY
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
查询#2
SELECT DISTINCT
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM TABLE) AS b
LEFT JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS c
FROM TABLE) AS d ON b.[ID] = f.[ID] AND a = c - 1
ORDER BY
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
查询 #1 和 #2 return 相同的结果,这是预期的,但是查询 #1 的 运行 时间大约为 5 秒,而查询 #2 的时间 运行时间大约1分35秒。换句话说,第二个查询 运行 比第一个多花了 1.5 分钟,我真的很想知道为什么。
编写此查询的正确方法是 lead()
。我很确定不需要 select distinct
,所以这就是你想要的:
SELECT stat, event,
LEAD(stat) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_stat,
LEAD(event) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_event
FROM TABLE t
ORDER BY stat, event;
你写的两个查询在SQL服务器中应该是一样的。显然,额外的子查询混淆了优化器。您需要了解执行计划才能更好地理解这一点。
我必须使用 SQL 进行实习,虽然我知道它的要点,但我并没有真正的编程背景,也不知道是什么让代码变得高效等等。
查询 #1
SELECT DISTINCT
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
FROM
(SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM
TABLE) AS b
) AS c
LEFT JOIN
(SELECT
*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS d
FROM
TABLE) AS e
) AS f ON c.[ID] = f.[ID] AND a = d - 1
ORDER BY
c.[STAT], c.[EVENT], f.[STAT], f.[EVENT]
查询#2
SELECT DISTINCT
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS a
FROM TABLE) AS b
LEFT JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY [ID] ORDER BY [PROCDT], [PROCTIME]) AS c
FROM TABLE) AS d ON b.[ID] = f.[ID] AND a = c - 1
ORDER BY
b.[STAT], b.[EVENT], d.[STAT], d.[EVENT]
查询 #1 和 #2 return 相同的结果,这是预期的,但是查询 #1 的 运行 时间大约为 5 秒,而查询 #2 的时间 运行时间大约1分35秒。换句话说,第二个查询 运行 比第一个多花了 1.5 分钟,我真的很想知道为什么。
编写此查询的正确方法是 lead()
。我很确定不需要 select distinct
,所以这就是你想要的:
SELECT stat, event,
LEAD(stat) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_stat,
LEAD(event) OVER (PARTITION BY ID, ORDER BY PROCDT, PROCTIME) as next_event
FROM TABLE t
ORDER BY stat, event;
你写的两个查询在SQL服务器中应该是一样的。显然,额外的子查询混淆了优化器。您需要了解执行计划才能更好地理解这一点。