令人费解的索引扫描性能。为什么扫描索引速度很慢，即使结果集很小并且有索引

Question

我有很大的 table ir_data (150GB)，其中包含不同日期的数据（val_date 列）。我需要知道在我的应用程序的各个点 ir_data 中是否有给定日期。

select distinct(val_date) from ir_data

下面的实验 ir_data 包含 val_date 的 29 个不同值。

设置 1

我希望 ir_data（val_date、key_id、other_colum）上的索引能够帮助快速找到 29 个值。事实上，这需要 5 多分钟:

Query 1 of 1, Rows read: 29, Elapsed time (seconds) - Total: 343.96, SQL query: 343.958, Reading results: 0.002

我一直希望索引是一棵树，其中节点存储在树结构中，例如像这样

val_date -> key_id   -> other_column -> data-nodes

1.1.2017 -> 0-50     -> A            -> (1.1.2017, 0, Automobile), (1.1.2017, 2, Amsterdam)
                     -> B-E          -> (1.1.2017, 12, Batman)
         -> 51-100   -> A            -> ...
                        X
         -> 666-1000 -> A
                     -> B-C
                     -> E
2.1.2017 -> ...

基于此结构，获得 29 个不同的 val_date 应该非常快。

问题：为什么要这么久？？？

子问题：有没有办法在不创建另一个 table 的情况下解决这个问题？

设置 2

我创建了另一个仅包含 val_date 的索引。这需要大约相同的时间。

查询计划：

    The type of query is SELECT.

2 operator(s) under root

   |ROOT:EMIT Operator (VA = 2)
   |
   |   |GROUP SORTED Operator (VA = 1)
   |   |Distinct
   |   |
   |   |   |SCAN Operator (VA = 0)
   |   |   |  FROM TABLE
   |   |   |  ir_data
   |   |   |  Index : ir_data_idx1 <-- this is the index containing only val_date.
   |   |   |  Forward Scan.
   |   |   |  Positioning at index start.
   |   |   |  Index contains all needed columns. Base table will not be read.
   |   |   |  Using I/O Size 16 Kbytes for index leaf pages.
   |   |   |  With MRU Buffer Replacement Strategy for index leaf pages.

Answer 1

您的 table 和您的索引一样大。正如您在计划中看到的，引擎执行索引扫描。此操作会很长，因为它将扫描您的整个索引以获取不同的值。

作为第一步，您可以在索引上尝试 update index statistics，不过，我真的认为它不会有帮助。

如果是一次性手动操作，我想你对5分钟的操作就满意了。

如果它是由您的应用程序执行的查询，那么您有 2 个我能想到的选择：

正如您在问题中所说 - 通过为日期创建额外的 table 并使用 FK 来规范化 table。
创建 precomputed result set。这是一个物化视图 - 结果像常规 table 一样存储（与仅存储其定义的视图相反）。
它将自动刷新视图中的结果并快速检索值。
重要：和索引一样，它会对Insert、Update...的性能产生影响
它看起来像：
```
create precomputed result set prs_ir_data
immediate refresh
as
select distinct val_date
from ir_data
```

您可以阅读here and here关于预计算结果集的内容

Answer 2

一个recursive CTE speeds up this query considerably, i.e. very few distinct values in a large table. The problem is that index seeking during search for distinct values is not implemented currently. Here is a link to the approach.

令人费解的索引扫描性能。为什么扫描索引速度很慢，即使结果集很小并且有索引

Puzzling performance of index scan. Why is scanning index slow even though result set is small and indexed

performance

sap-ase