Postgres- pgsql 需要更多时间从 table 中检索超过 15 亿行的数据

Postgres- pgsql taking more time to retrieve data from table with more than 1.5 billion rows

如何优化 table 或查询以下 pgsql 查询(需要 34 分钟才能获得 770 条记录)?已经为少数列添加了索引到 table。不确定还有什么可以进行此查询

查询:

SELECT 
    min(p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles') as Date, 
    'America/Los_Angeles' AS Timezone, 
    sum(GREATEST(0, p.value)) as Value, 
    p.uom as UnitOfMeasurement
FROM
    pv.bsa_vessel_vs p                                 
WHERE
        p.start_timestamp AT TIME ZONE p.timezone >= '2017-01-01'
    and p.start_timestamp AT TIME ZONE p.timezone <  '2017-02-01'
    and p.vessel_serial_number ='U57625059'
GROUP BY
    date_trunc('hour', p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles'), p.uom   
ORDER BY
    Date ;

Table:

CREATE TABLE pv.bsa_vessel_vs
(
  bsa_vessel_vs_id bigserial NOT NULL,
  data_source_id bigint NOT NULL,
  start_timestamp timestamp without time zone NOT NULL,
  end_timestamp timestamp without time zone NOT NULL,
  value numeric(12,4) NOT NULL,
  uom text NOT NULL,
  timezone text NOT NULL,
  created_timestamp timestamp without time zone DEFAULT now(),
  updated_timestamp timestamp without time zone DEFAULT now(),
  vessel_serial_number text NOT NULL,
  CONSTRAINT bsa_vessel_vs_pkey PRIMARY KEY (bsa_vessel_vs_id),
  CONSTRAINT bsa_vessel_vs_data_source_id_fkey FOREIGN KEY (data_source_id)
      REFERENCES pv.data_source (data_source_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
  OIDS=FALSE
);

CREATE INDEX pm_start_timestamp_ndex
  ON pv.bsa_vessel_vs
  USING btree
  (start_timestamp DESC NULLS LAST);

CREATE INDEX bsa_vessel_vs_meter_ts_idx
  ON pv.bsa_vessel_vs
  USING btree
  (vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp);


CREATE UNIQUE INDEX bsa_vessel_vs_u_idx
  ON pv.bsa_vessel_vs
  USING btree
  (data_source_id, vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp DESC);

谢谢 卡西

更改您的索引,使其包含您在 WHERE 子句中使用的相同 表达式 ,即:

CREATE INDEX bsa_vessel_vs_meter_ts_2_idx
  ON bsa_vessel_vs
  USING btree
  ( vessel_serial_number COLLATE pg_catalog."default", 
    (start_timestamp AT TIME ZONE timezone), 
    (start_timestamp AT TIME ZONE timezone)
  );

当您定义该索引时,您将获得一个使用它的执行计划:

| QUERY PLAN                                                                                                                                                                                                                                                            |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sort  (cost=69.60..69.70 rows=39 width=83)                                                                                                                                                                                                                            |
|   Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))))                                                                                                                                                                         |
|   ->  HashAggregate  (cost=67.79..68.57 rows=39 width=83)                                                                                                                                                                                                             |
|         Group Key: date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))), uom                                                                                                                                          |
|         ->  Index Scan using bsa_vessel_vs_meter_ts_2_idx on bsa_vessel_vs p  (cost=0.28..67.20 rows=39 width=44)                                                                                                                                                     |
|               Index Cond: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |

然而,如果索引 不存在 ,PostgreSQL 将求助于完整的 table 扫描:

| QUERY PLAN                                                                                                                                                                                                                                                              |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sort  (cost=298.84..298.94 rows=39 width=83)                                                                                                                                                                                                                            |
|   Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))))                                                                                                                                                                           |
|   ->  GroupAggregate  (cost=296.35..297.81 rows=39 width=83)                                                                                                                                                                                                            |
|         Group Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom                                                                                                                                          |
|         ->  Sort  (cost=296.35..296.45 rows=39 width=44)                                                                                                                                                                                                                |
|               Sort Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom                                                                                                                                     |
|               ->  Seq Scan on bsa_vessel_vs p  (cost=0.00..295.32 rows=39 width=44)                                                                                                                                                                                     |
|                     Filter: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |

您可以在 dbfiddle here

查看所有设置