简单的查询需要很长时间

Question

我正在运行对 postgres 数据库进行相对简单的查询：

 INSERT INTO tt (pid, trip_pid) SELECT stop_time.pid, trip.pid                                 
 FROM stop_time, trip                                                                          
 WHERE stop_time.trip_id = trip.trip_id AND 
 17 = trip.gtfsfeed_id 
 AND 17 = stop_time.gtfsfeed_id

tt 是临时的 table，stop_time 包含大约 200 万行，而 trip 仅包含大约 50,000 行。此查询已在我的 aws rds 实例上运行ning 一个多小时，我不确定为什么。这个查询有什么低效的地方吗？

编辑：这是 EXPLAIN（我创建了一个新的临时 table，其中包含与运行解释相同的列）

                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Insert on ll  (cost=2604.38..75394.65 rows=1649975 width=8)
   ->  Hash Join  (cost=2604.38..75394.65 rows=1649975 width=8)
         Hash Cond: ((stop_time.trip_id)::text = (trip.trip_id)::text)
         ->  Seq Scan on stop_time  (cost=0.00..49406.68 rows=1835694 width=34)
               Filter: (gtfsfeed_id = 17)
         ->  Hash  (cost=2123.74..2123.74 rows=38451 width=34)
               ->  Seq Scan on trip  (cost=0.00..2123.74 rows=38451 width=34)
                     Filter: (gtfsfeed_id = 17)

Answer 1

检查这个查询，可能会更快

INSERT INTO tt (pid, trip_pid) 
SELECT stop_time.pid, trip.pid                                 
FROM stop_time st
join trip t on t.trip_id = st.trip_id
where t.gtfsfeed_id = 17
and st.gtfsfeed_id = 17;

并且您可以为列添加索引 gtfsfeed_id

Answer 2

查看 table 统计数据是否准确，并尝试在 stop_time(gtfsfeed_id) and/or trip(gtfsfeed_id).

上建立索引

Answer 3

对 trip 和 stop_time 表的顺序扫描表明它们未在 trip_id 字段上建立索引。向两个表添加 trip_id 索引将显着改善 JOIN.

此外，在两个表中的 gtfsfeed_id 上添加索引将使查询速度更快，因为您的查询将结果限制为这些字段的特定值。

提示： 添加在 JOIN 和 WHERE 语句中使用的字段索引通常很有用。

简单的查询需要很长时间

Simple query taking very long time

sql

postgresql

join

amazon-rds