Select 明显很慢

Select distinct very slow

我有一个 table 用于存储带有外部 ID 的行。我经常需要 select 给定外部 ID 的最新时间戳。现在它是我的应用程序的瓶颈

查询:

SELECT DISTINCT ON ("T1"."external_id") "T1"."external_id", "T1"."timestamp" 
FROM "T1" 
WHERE "T1"."external_id" IN ('825889935', '825904511')
ORDER BY "T1"."external_id" ASC, "T1"."timestamp" DESC

解释:

Unique  (cost=169123.13..169123.19 rows=12 width=18) (actual time=1327.443..1334.118 rows=2 loops=1)
   ->  Sort  (cost=169123.13..169123.16 rows=12 width=18) (actual time=1327.441..1334.112 rows=2 loops=1)
         Sort Key: external_id, timestamp DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Gather  (cost=1000.00..169122.91 rows=12 width=18) (actual time=752.577..1334.056 rows=2 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on T1  (cost=0.00..168121.71 rows=5 width=18) (actual time=921.649..1300.556 rows=1 loops=3)
                     Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
                     Rows Removed by Filter: 1168882
 Planning Time: 0.592 ms
 Execution Time: 1334.159 ms

我该怎么做才能使这个查询更快?或者我应该使用完全不同的查询?

更新:

按照@jahrl 的要求添加了新的查询计划。看起来查询速度更快,但之前的查询计划是在负载下制定的,现在它的工作时间差不多

Finalize GroupAggregate  (cost=169121.80..169123.21 rows=12 width=18) (actual time=321.009..322.410 rows=2 loops=1)
   Group Key: external_id
   ->  Gather Merge  (cost=169121.80..169123.04 rows=10 width=18) (actual time=321.003..322.403 rows=2 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=168121.77..168121.86 rows=5 width=18) (actual time=318.671..318.672 rows=1 loops=3)
               Group Key: external_id
               ->  Sort  (cost=168121.77..168121.78 rows=5 width=18) (actual time=318.664..318.665 rows=1 loops=3)
                     Sort Key: external_id
                     Sort Method: quicksort  Memory: 25kB
                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                     ->  Parallel Seq Scan on T1  (cost=0.00..168121.71 rows=5 width=18) (actual time=144.338..318.611 rows=1 loops=3)
                           Filter: ((external_id)::text = ANY ('{825889935,825904511}'::text[]))
                           Rows Removed by Filter: 1170827
 Planning Time: 0.093 ms
 Execution Time: 322.441 ms

也许基本的 GROUP BY 查询会执行得更好?

SELECT "T1"."external_id", MAX("T1"."timestamp") as "timestamp"
FROM "T1" 
WHERE "T1"."external_id" IN ('825889935', '825904511')
GROUP BY "T1"."external_id"
ORDER BY "T1"."external_id" ASC

而且,正如@melcher 所说,不要忘记 ("external_id", "timestamp") 索引!

查看过滤器删除的行数并在 external_id 上创建索引。