Postgres- pgsql 需要更多时间从 table 中检索超过 15 亿行的数据
Postgres- pgsql taking more time to retrieve data from table with more than 1.5 billion rows
如何优化 table 或查询以下 pgsql 查询(需要 34 分钟才能获得 770 条记录)?已经为少数列添加了索引到 table。不确定还有什么可以进行此查询
查询:
SELECT
min(p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles') as Date,
'America/Los_Angeles' AS Timezone,
sum(GREATEST(0, p.value)) as Value,
p.uom as UnitOfMeasurement
FROM
pv.bsa_vessel_vs p
WHERE
p.start_timestamp AT TIME ZONE p.timezone >= '2017-01-01'
and p.start_timestamp AT TIME ZONE p.timezone < '2017-02-01'
and p.vessel_serial_number ='U57625059'
GROUP BY
date_trunc('hour', p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles'), p.uom
ORDER BY
Date ;
Table:
CREATE TABLE pv.bsa_vessel_vs
(
bsa_vessel_vs_id bigserial NOT NULL,
data_source_id bigint NOT NULL,
start_timestamp timestamp without time zone NOT NULL,
end_timestamp timestamp without time zone NOT NULL,
value numeric(12,4) NOT NULL,
uom text NOT NULL,
timezone text NOT NULL,
created_timestamp timestamp without time zone DEFAULT now(),
updated_timestamp timestamp without time zone DEFAULT now(),
vessel_serial_number text NOT NULL,
CONSTRAINT bsa_vessel_vs_pkey PRIMARY KEY (bsa_vessel_vs_id),
CONSTRAINT bsa_vessel_vs_data_source_id_fkey FOREIGN KEY (data_source_id)
REFERENCES pv.data_source (data_source_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
CREATE INDEX pm_start_timestamp_ndex
ON pv.bsa_vessel_vs
USING btree
(start_timestamp DESC NULLS LAST);
CREATE INDEX bsa_vessel_vs_meter_ts_idx
ON pv.bsa_vessel_vs
USING btree
(vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp);
CREATE UNIQUE INDEX bsa_vessel_vs_u_idx
ON pv.bsa_vessel_vs
USING btree
(data_source_id, vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp DESC);
谢谢
卡西
更改您的索引,使其包含您在 WHERE
子句中使用的相同 表达式 ,即:
CREATE INDEX bsa_vessel_vs_meter_ts_2_idx
ON bsa_vessel_vs
USING btree
( vessel_serial_number COLLATE pg_catalog."default",
(start_timestamp AT TIME ZONE timezone),
(start_timestamp AT TIME ZONE timezone)
);
当您定义该索引时,您将获得一个使用它的执行计划:
| QUERY PLAN |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sort (cost=69.60..69.70 rows=39 width=83) |
| Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) |
| -> HashAggregate (cost=67.79..68.57 rows=39 width=83) |
| Group Key: date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))), uom |
| -> Index Scan using bsa_vessel_vs_meter_ts_2_idx on bsa_vessel_vs p (cost=0.28..67.20 rows=39 width=44) |
| Index Cond: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |
然而,如果索引 不存在 ,PostgreSQL 将求助于完整的 table 扫描:
| QUERY PLAN |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sort (cost=298.84..298.94 rows=39 width=83) |
| Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) |
| -> GroupAggregate (cost=296.35..297.81 rows=39 width=83) |
| Group Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom |
| -> Sort (cost=296.35..296.45 rows=39 width=44) |
| Sort Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom |
| -> Seq Scan on bsa_vessel_vs p (cost=0.00..295.32 rows=39 width=44) |
| Filter: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |
您可以在 dbfiddle here
查看所有设置
如何优化 table 或查询以下 pgsql 查询(需要 34 分钟才能获得 770 条记录)?已经为少数列添加了索引到 table。不确定还有什么可以进行此查询
查询:
SELECT
min(p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles') as Date,
'America/Los_Angeles' AS Timezone,
sum(GREATEST(0, p.value)) as Value,
p.uom as UnitOfMeasurement
FROM
pv.bsa_vessel_vs p
WHERE
p.start_timestamp AT TIME ZONE p.timezone >= '2017-01-01'
and p.start_timestamp AT TIME ZONE p.timezone < '2017-02-01'
and p.vessel_serial_number ='U57625059'
GROUP BY
date_trunc('hour', p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles'), p.uom
ORDER BY
Date ;
Table:
CREATE TABLE pv.bsa_vessel_vs
(
bsa_vessel_vs_id bigserial NOT NULL,
data_source_id bigint NOT NULL,
start_timestamp timestamp without time zone NOT NULL,
end_timestamp timestamp without time zone NOT NULL,
value numeric(12,4) NOT NULL,
uom text NOT NULL,
timezone text NOT NULL,
created_timestamp timestamp without time zone DEFAULT now(),
updated_timestamp timestamp without time zone DEFAULT now(),
vessel_serial_number text NOT NULL,
CONSTRAINT bsa_vessel_vs_pkey PRIMARY KEY (bsa_vessel_vs_id),
CONSTRAINT bsa_vessel_vs_data_source_id_fkey FOREIGN KEY (data_source_id)
REFERENCES pv.data_source (data_source_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
CREATE INDEX pm_start_timestamp_ndex
ON pv.bsa_vessel_vs
USING btree
(start_timestamp DESC NULLS LAST);
CREATE INDEX bsa_vessel_vs_meter_ts_idx
ON pv.bsa_vessel_vs
USING btree
(vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp);
CREATE UNIQUE INDEX bsa_vessel_vs_u_idx
ON pv.bsa_vessel_vs
USING btree
(data_source_id, vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp DESC);
谢谢 卡西
更改您的索引,使其包含您在 WHERE
子句中使用的相同 表达式 ,即:
CREATE INDEX bsa_vessel_vs_meter_ts_2_idx
ON bsa_vessel_vs
USING btree
( vessel_serial_number COLLATE pg_catalog."default",
(start_timestamp AT TIME ZONE timezone),
(start_timestamp AT TIME ZONE timezone)
);
当您定义该索引时,您将获得一个使用它的执行计划:
| QUERY PLAN | | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Sort (cost=69.60..69.70 rows=39 width=83) | | Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) | | -> HashAggregate (cost=67.79..68.57 rows=39 width=83) | | Group Key: date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))), uom | | -> Index Scan using bsa_vessel_vs_meter_ts_2_idx on bsa_vessel_vs p (cost=0.28..67.20 rows=39 width=44) | | Index Cond: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |
然而,如果索引 不存在 ,PostgreSQL 将求助于完整的 table 扫描:
| QUERY PLAN | | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Sort (cost=298.84..298.94 rows=39 width=83) | | Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))) | | -> GroupAggregate (cost=296.35..297.81 rows=39 width=83) | | Group Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom | | -> Sort (cost=296.35..296.45 rows=39 width=44) | | Sort Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom | | -> Seq Scan on bsa_vessel_vs p (cost=0.00..295.32 rows=39 width=44) | | Filter: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |
您可以在 dbfiddle here
查看所有设置