在子查询中使用空 OVER() 子句

Using empty OVER() clause in subquery

我有一个使用相关子查询的 Oracle SQL 查询:

Q1

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND t2.effective_date IN (SELECT max(tc.effective_date)
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code)

我已将子查询更改为我们的 Empty OVER() 调用,这显着提高了此查询的性能:

Q2

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND t2.effective_date IN (SELECT max(tc.effective_date) OVER () AS ed
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code)

新查询 return 与原始查询的结果集相同,因此似乎有效...,但为什么解释计划如此不同,它似乎仍然相关,是吗不再对外部查询中的每一行进行评估?为什么?

我想了解第二个查询中发生了什么。

我想我可以使用 row_number() OVER (partition by ...):

以第三种方式重写此查询

Q3

    SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate),
       ct.*
FROM table1 t1
         JOIN table2 t2 ON t2.code = t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
         JOIN (SELECT ct.*, row_number() OVER (PARTITION BY ct.code ORDER BY ct.effective_date ASC) AS rn
               FROM tablecore ct) ct
              ON t1.code = ct.code
                  AND rn = 1
                  AND ct.effective_date <= t3.processed_date
WHERE t2.effective_date in(ct.effective_date)
  AND t4.type IN ('value1', 'value2', 'value3')
  AND t3.processed_date >= '01 JUL 2019'
  AND t3.processed_date < '22 JUL 2019'
  AND t2.effective_date IN (ct.effective_date);

这个版本似乎也可以,但比第二个版本慢。

编辑 正如@Christian Q3 所指出的那样 return 不正确的结果

既然可以简单地使用 = 运算符

,为什么还要使用 IN
SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND t2.effective_date = (SELECT max(tc.effective_date)
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code)

显着的性能改进是因为磁盘

减少了I/O

您可以尝试另一个选项:

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND EXISTS (SELECT 1
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code
                                HAVING t2.effective_date = max(tc.effective_date))

为了进行正确的性能分析,最好有表的大小和不同查询的解释计划(如果优化器统计信息准确的话,还有检查)。

仅供参考:您最后一次使用 row_number() 进行的查询不会提供相同的结果,因为窗口函数将在连接 ct.effective_date <= t3.processed_date 之前进行评估,因此您会遗漏一些行。