PostgreSQL 慢查询

PostgreSQL slow query

OpenVAS(由 PostgreSQL 支持)实例在打开 'Tasks' 选项卡时速度很慢。

以下查询在 PostgreSQL 中运行了 22 秒。有什么可以优化的建议吗?

SELECT id, host,
       iso_time (start_time), iso_time (end_time),
       current_port, max_port, report,
       (SELECT uuid FROM reports WHERE id = report),
       (SELECT uuid FROM hosts
        WHERE id = (SELECT host FROM host_identifiers
                    WHERE source_type = 'Report Host'
                      AND name = 'ip'
                      AND source_id = (SELECT uuid FROM reports
                                       WHERE id = report)
                      AND value = report_hosts.host
                    LIMIT 1)
       )
FROM report_hosts
WHERE report = 702;

计划是

 Index Scan using report_hosts_by_report on report_hosts  (cost=0.42..1975570.99 rows=447 width=38) (actual time=50.042..22979.257 rows=1206 loops=1)
   Index Cond: (report = 702)
   SubPlan 1
     ->  Index Scan using reports_pkey on reports  (cost=0.28..2.49 rows=1 width=37) (actual time=0.004..0.004 rows=1 loops=1206)
           Index Cond: (id = report_hosts.report)
   SubPlan 4
     ->  Index Scan using hosts_pkey on hosts  (cost=4414.37..4416.59 rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1206)
           Index Cond: (id = )
           InitPlan 3 (returns )
             ->  Limit  (cost=2.49..4414.09 rows=1 width=4) (actual time=18.998..18.998 rows=0 loops=1206)
                   InitPlan 2 (returns )
                     ->  Index Scan using reports_pkey on reports reports_1  (cost=0.28..2.49 rows=1 width=37) (actual time=0.001..0.001 rows=1 loops=1206)
                           Index Cond: (id = report_hosts.report)
                   ->  Seq Scan on host_identifiers  (cost=0.00..4411.60 rows=1 width=4) (actual time=18.997..18.997 rows=0 loops=1206)
                         Filter: ((source_type = 'Report Host'::text) AND (name = 'ip'::text) AND (source_id = ) AND (value = report_hosts.host))
                         Rows Removed by Filter: 99459
 Planning time: 0.531 ms
 Execution time: 22979.575 ms

所有的时间都花在了host_identifiers的1206次顺序扫描上。

尝试用连接替换子查询:

SELECT rh.id, rh.host,
       iso_time(rh.start_time), iso_time(rh.end_time),
       rh.current_port, rh.max_port, rh.report,
       r.uuid,
       h.uuid
FROM report_hosts AS rh
   LEFT JOIN reports AS r
      ON rh.report = r.id
   LEFT JOIN host_identifiers AS hi
      ON hi.source_id = r.uuid
         AND hi.value = rh.host
         AND hi.source_type = 'Report Host'
         AND hi.name = 'ip'
   LEFT JOIN hosts AS h
      ON h.id = hi.host
WHERE rh.report = 702;

这并不完全相同,因为它没有考虑没有 ORDER BY 就毫无意义的 LIMIT 1,但它应该接近事实。

适当的索引会使它变快(如果它们还不存在的话):

  • 一个在 reports(id)
  • 一个在 host_identifiers(source_id, value)
  • 一个在 hosts(id)

您的查询难以阅读,因为您没有限定名称为 table 的列。

哇!添加索引 host_identifiers(source_id, value) 正是我要找的:

create INDEX host_identifiers_source_id_value on host_identifiers(source_id, value);

'Tasks' 选项卡的页面加载时间从 70 秒减少到 13 秒。

谢谢!