为什么 3 个相同值上的 DISTINCT 这么慢？

Question

我有一个我无法理解的性能问题。我正在执行的查询如下：

SELECT UNIQUE GROUP_ID
  FROM pos, TABLE (t_type(12984918, 12984919, 12984917))
  WHERE pos.pos_id = COLUMN_VALUE AND GROUP_ID <> 0;

1 row returned in 99 ms

您可能认为 83 毫秒很好，但真正让我困惑的是，即使只有 3 个重复值，删除 UNIQUE 也会使查询更快：

SELECT GROUP_ID
  FROM pos, TABLE (t_type(12984918, 12984919, 12984917))
  WHERE pos.pos_id = COLUMN_VALUE AND GROUP_ID <> 0;

3 rows returned in 0.048 ms

oracle 真的只需要将近 100 毫秒就可以将一组 3 个元素缩减为一个值吗？这对我来说似乎很疯狂，所以我开始做一些调查并使用 TKPROF.

获取执行计划

SELECT UNIQUE GROUP_ID
  FROM pos, TABLE (t_type(12984918, 12984919, 12984917))
 WHERE pos.pos_id = COLUMN_VALUE AND GROUP_ID <> 0

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.09          0          8          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.10          0         10          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  HASH UNIQUE (cr=8 pr=0 pw=0 time=99258 us cost=31 size=3892 card=139)
         3          3          3   NESTED LOOPS  (cr=8 pr=0 pw=0 time=84 us)
         3          3          3    NESTED LOOPS  (cr=5 pr=0 pw=0 time=59 us cost=30 size=3892 card=139)
         3          3          3     COLLECTION ITERATOR CONSTRUCTOR FETCH (cr=0 pr=0 pw=0 time=7 us cost=29 size=16336 card=8168)
         3          3          3     INDEX UNIQUE SCAN IU_POS_POS_ID (cr=5 pr=0 pw=0 time=31 us cost=0 size=0 card=1)(object id 20684)
         3          3          3    TABLE ACCESS BY INDEX ROWID POS (cr=3 pr=0 pw=0 time=12 us cost=0 size=26 card=1)

好吧，看起来 Oracle 使用 HASH UNIQUE 操作确实花费了 99 毫秒将 3 个相同的值减少为一个值。这实际上仍然看起来很疯狂。那应该几乎不会引起注意。所以我开始寻找替代方案。原来Oracle也可以用SORT UNIQUE操作来实现DISTINCT。我还发现我可以通过以下方式强制 oracle 不使用 HASH UNIQUE：

ALTER SESSION SET "_gby_hash_aggregation_enabled" = true;

现在，我的查询运行时间为 0.48 毫秒！这是 2000 倍的改进！卧槽！这是显示它正在使用 SORT UNIQUE:

的执行计划

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          4          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         16          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.00       0.00          0         20          0           2

Misses in library cache during parse: 2
Optimizer mode: ALL_ROWS
Parsing user id: 58  
Number of plan statistics captured: 2

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT UNIQUE (cr=8 pr=0 pw=0 time=48 us cost=30 size=28 card=1)
         3          3          3   NESTED LOOPS  (cr=8 pr=0 pw=0 time=35 us)
         3          3          3    NESTED LOOPS  (cr=5 pr=0 pw=0 time=26 us cost=29 size=28 card=1)
         3          3          3     COLLECTION ITERATOR CONSTRUCTOR FETCH (cr=0 pr=0 pw=0 time=2 us cost=29 size=6 card=3)
         3          3          3     INDEX UNIQUE SCAN IU_POS_POS_ID (cr=5 pr=0 pw=0 time=17 us cost=0 size=0 card=1)(object id 20684)
         3          3          3    TABLE ACCESS BY INDEX ROWID POS (cr=3 pr=0 pw=0 time=7 us cost=0 size=26 card=1)

********************************************************************************

好的，现在我在想如何在不触及会话参数的情况下强制 oracle 使用 SORT UNIQUE。原来我可以只添加一个 ORDER BY 子句，并加速我的查询。

SELECT UNIQUE GROUP_ID
  FROM pos, TABLE (t_type(12984918, 12984919, 12984917))
 WHERE pos.pos_id = COLUMN_VALUE AND GROUP_ID <> 0
 ORDER BY GROUP_ID;

所以基本上我已经解决了我的问题（如果你可以称之为 hack "solving a problem"），但我仍然感到困惑。

问题来了：

为什么 Oracle 选择 HASH UNIQUE 而不是 SORT UNIQUE？
为什么只有 3 条记录要散列时 HASH UNIQUE 这么慢？
是否有更好的方法来提示 oracle 使用 SORT UNIQUE 而不是 HASH UNIQUE？

Answer 1

Why is Oracle choosing HASH UNIQUE over SORT UNIQUE?

您的第一个 tkprof 结果显示 TABLE 迭代器的基数更高（预期？）。 Oracle 正在优化多于三行。

Why is HASH UNIQUE so slow when there are only 3 records to hash?

Hash unique 将需要构建一个散列table，无论有多少元素被散列。据推测，对于较少的元素，散列的大小 table 会更小，但基数估计再次表明 Oracle 将构建更大的散列 table。

Is there a better way to hint oracle to use SORT UNIQUE over HASH UNIQUE?

我会尝试在 table 运算符上暗示基数。

Answer 2

Is there a better way to hint oracle to use SORT UNIQUE over HASH UNIQUE?

我会检查 table 和索引统计数据是如何收集的。 oracle 优化器使用它们来构建有效的访问计划。

为什么 3 个相同值上的 DISTINCT 这么慢？

Why is DISTINCT on 3 identical values so slow?

oracle

query-optimization

profiler