当我们使用 Unique 索引而不是非 Unique 索引时,性能是否有所提高?

Is any performance enhancement when we used Unique index instead of non Unique index?

我知道如果数据是唯一的,理论上唯一索引会比非唯一索引更快。

因为唯一索引能够提供更多的信息,让查询优化器选择更有效的执行计划。

我正在做一些测试,想证明唯一索引可能比执行计划中的非唯一索引更好,但结果告诉我它们是一样的...

CREATE TABLE T3(
    ID INT NOT NULL,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

CREATE INDEX IX_ID_T3 ON T3 (ID);
CREATE UNIQUE INDEX UIX_ID_T3 ON T3 (ID);

INSERT INTO T3
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

vacuum ANALYZE T3;

我创建了一个 table 和两个索引(IX_ID_T3 是非唯一的,UIX_ID_T3 是唯一的)然后插入了 1000000 个示例行。

插入数据后 运行 vacuum ANALYZE T3;

--drop index IX_ID_T3 

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT DISTINCT a1.ID
FROM T3 a1 INNER JOIN T3 a2
ON a1.id = a2.id
WHERE a1.id <= 300000

第一次查询,我曾尝试通过Merge-Join

UIX_ID_T3IX_ID_T3之间进行测试

Buffers: shared hit和执行计划没有区别。

这是我的执行计划

-- UIX_ID_T3 
"Unique  (cost=0.85..41457.94 rows=298372 width=4) (actual time=0.030..267.207 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40712.01 rows=298372 width=4) (actual time=0.030..200.412 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using uix_id_t3 on t3 a1  (cost=0.42..8501.93 rows=298372 width=4) (actual time=0.017..49.237 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using uix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.010..40.170 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.171 ms"
"Execution Time: 282.919 ms"

---IX_ID_T3 
"Unique  (cost=0.85..41420.43 rows=297587 width=4) (actual time=0.027..230.256 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40676.46 rows=297587 width=4) (actual time=0.027..173.308 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using ix_id_t3 on t3 a1  (cost=0.42..8476.20 rows=297587 width=4) (actual time=0.015..41.606 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using ix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.009..34.019 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.195 ms"
"Execution Time: 243.711 ms"

还有一个问题are-unique-indexes-better-for-column-search-performance-pgsql-mysql要讨论这个话题。

我也尝试过测试问题查询的答案,但执行计划没有什么不同。

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT  id
FROM    T3
ORDER BY
        id
LIMIT 10;
-- using IX_ID_T3 
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.034..0.036 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using uix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.033..0.034 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.052 ms"
"Execution Time: 0.047 ms"

-- using IX_ID_T3
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.026..0.029 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using ix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.025..0.027 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.075 ms"
"Execution Time: 0.043 ms"

我看了很多不同的文章,但我无法通过执行计划证明唯一索引比非唯一索引更好。

Postgres unique constraint vs index

问题:

谁能从执行计划中证明唯一索引比非唯一索引更好,并向我们展示查询和执行计划?

To my knowledge from unique index of sql-server not only be a constraint but also can be better performance than non unique index.

The Many Mysteries of Merge Joins

唯一索引的扫描速度不会比非唯一索引快。查询执行速度的唯一潜在好处可能是优化器可以从唯一性中进行某些推论,例如删除不必要的连接。

唯一索引的主要用途是实施 table 约束,而不是提供优于非唯一索引的性能优势。

这是一个例子:

CREATE TABLE parent (pid bigint PRIMARY KEY);

CREATE TABLE child (
   cid bigint PRIMARY KEY,
   pid bigint UNIQUE REFERENCES parent
);

EXPLAIN (COSTS OFF)
SELECT parent.pid FROM parent LEFT JOIN child USING (pid);

     QUERY PLAN     
════════════════════
 Seq Scan on parent
(1 row)

没有对 child.pid 的唯一约束(由唯一索引实现),无法删除连接。