在子查询中使用空 OVER() 子句
Using empty OVER() clause in subquery
我有一个使用相关子查询的 Oracle SQL 查询:
Q1
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date IN (SELECT max(tc.effective_date)
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code)
我已将子查询更改为我们的 Empty OVER() 调用,这显着提高了此查询的性能:
Q2
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date IN (SELECT max(tc.effective_date) OVER () AS ed
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code)
新查询 return 与原始查询的结果集相同,因此似乎有效...,但为什么解释计划如此不同,它似乎仍然相关,是吗不再对外部查询中的每一行进行评估?为什么?
我想了解第二个查询中发生了什么。
我想我可以使用 row_number() OVER (partition by ...)
:
以第三种方式重写此查询
Q3
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate),
ct.*
FROM table1 t1
JOIN table2 t2 ON t2.code = t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
JOIN (SELECT ct.*, row_number() OVER (PARTITION BY ct.code ORDER BY ct.effective_date ASC) AS rn
FROM tablecore ct) ct
ON t1.code = ct.code
AND rn = 1
AND ct.effective_date <= t3.processed_date
WHERE t2.effective_date in(ct.effective_date)
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date IN (ct.effective_date);
这个版本似乎也可以,但比第二个版本慢。
编辑
正如@Christian Q3 所指出的那样 return 不正确的结果
既然可以简单地使用 =
运算符
,为什么还要使用 IN
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date = (SELECT max(tc.effective_date)
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code)
显着的性能改进是因为磁盘
减少了I/O
您可以尝试另一个选项:
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND EXISTS (SELECT 1
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code
HAVING t2.effective_date = max(tc.effective_date))
为了进行正确的性能分析,最好有表的大小和不同查询的解释计划(如果优化器统计信息准确的话,还有检查)。
仅供参考:您最后一次使用 row_number()
进行的查询不会提供相同的结果,因为窗口函数将在连接 ct.effective_date <= t3.processed_date
之前进行评估,因此您会遗漏一些行。
我有一个使用相关子查询的 Oracle SQL 查询:
Q1
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date IN (SELECT max(tc.effective_date)
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code)
我已将子查询更改为我们的 Empty OVER() 调用,这显着提高了此查询的性能:
Q2
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date IN (SELECT max(tc.effective_date) OVER () AS ed
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code)
新查询 return 与原始查询的结果集相同,因此似乎有效...,但为什么解释计划如此不同,它似乎仍然相关,是吗不再对外部查询中的每一行进行评估?为什么?
我想了解第二个查询中发生了什么。
我想我可以使用 row_number() OVER (partition by ...)
:
Q3
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate),
ct.*
FROM table1 t1
JOIN table2 t2 ON t2.code = t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
JOIN (SELECT ct.*, row_number() OVER (PARTITION BY ct.code ORDER BY ct.effective_date ASC) AS rn
FROM tablecore ct) ct
ON t1.code = ct.code
AND rn = 1
AND ct.effective_date <= t3.processed_date
WHERE t2.effective_date in(ct.effective_date)
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date IN (ct.effective_date);
这个版本似乎也可以,但比第二个版本慢。
编辑 正如@Christian Q3 所指出的那样 return 不正确的结果
既然可以简单地使用 =
运算符
IN
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND t2.effective_date = (SELECT max(tc.effective_date)
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code)
显着的性能改进是因为磁盘
减少了I/O您可以尝试另一个选项:
SELECT t1.id,
t3.code,
t3.processed_date,
(t1.total / t2.rate)
FROM table1 t1
JOIN table2 t2 ON t2.code= t1.code
JOIN table3 t3 ON t3.id = t1.id
JOIN table4 t4 ON t4.code = t1.code
AND t4.type IN ('value1', 'value2', 'value3')
AND t3.processed_date >= '01 JUL 2019'
AND t3.processed_date < '22 JUL 2019'
AND EXISTS (SELECT 1
FROM tableCore tc
WHERE tc.effective_date <= t3.processed_date
AND t1.code = tc.code
HAVING t2.effective_date = max(tc.effective_date))
为了进行正确的性能分析,最好有表的大小和不同查询的解释计划(如果优化器统计信息准确的话,还有检查)。
仅供参考:您最后一次使用 row_number()
进行的查询不会提供相同的结果,因为窗口函数将在连接 ct.effective_date <= t3.processed_date
之前进行评估,因此您会遗漏一些行。