PostgreSQL 子查询案例顺序扫描
PostgreSQL subquery case sequential scan
我有 2 个表:
requests
:内容6000万条记录(用作网站日志)
requests_hours
:内容几百行(从requests
表中每分钟更新一次)
我有以下简单的查询,但是当我执行它时,大约需要 5 分钟才能完成,因为 Postgres 不使用 request_time_utc
列的索引,只是进行顺序扫描。
SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)
但是如果我只是删除子查询(它本身在 0.003 秒内执行)并将其替换为如下所示的静态值,我将在仅 0.008 秒内执行以下查询:
SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= '2019-09-30 17:00:00'
查询应该每分钟只计算几行,从 1000 到 7000,所以对列 request_time_utc
的索引扫描肯定比顺序扫描好得多。
我不明白如何强制 PostgreSQL 对第一个查询进行索引扫描。
以上查询是为了简化问题;这是原始的:
SELECT
customer_id,
DATE_TRUNC('hour', request_time_utc) AS request_hour_utc,
COUNT(request_id) AS total_requests,
SUM(data_in_size) AS total_data_in_size,
SUM(data_out_size) AS total_data_out_size,
SUM(process_long) AS total_process_long
FROM requests
WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)
AND customer_id IS NOT NULL
GROUP BY request_hour_utc , customer_id
ORDER BY request_hour_utc DESC;
将你的子查询移动到 CTE,就像这样(我在火车上 phone 写这篇文章,所以你需要在正确的查询处着陆:-)):
WITH your_max AS (SELECT MAX(request_hour_utc) as foo FROM requests_hours)
SELECT COUNT(request_id)
FROM requests CROSS JOIN your_max
WHERE request_time_utc >= your_max.foo
我有 2 个表:
requests
:内容6000万条记录(用作网站日志)requests_hours
:内容几百行(从requests
表中每分钟更新一次)
我有以下简单的查询,但是当我执行它时,大约需要 5 分钟才能完成,因为 Postgres 不使用 request_time_utc
列的索引,只是进行顺序扫描。
SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)
但是如果我只是删除子查询(它本身在 0.003 秒内执行)并将其替换为如下所示的静态值,我将在仅 0.008 秒内执行以下查询:
SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= '2019-09-30 17:00:00'
查询应该每分钟只计算几行,从 1000 到 7000,所以对列 request_time_utc
的索引扫描肯定比顺序扫描好得多。
我不明白如何强制 PostgreSQL 对第一个查询进行索引扫描。
以上查询是为了简化问题;这是原始的:
SELECT
customer_id,
DATE_TRUNC('hour', request_time_utc) AS request_hour_utc,
COUNT(request_id) AS total_requests,
SUM(data_in_size) AS total_data_in_size,
SUM(data_out_size) AS total_data_out_size,
SUM(process_long) AS total_process_long
FROM requests
WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)
AND customer_id IS NOT NULL
GROUP BY request_hour_utc , customer_id
ORDER BY request_hour_utc DESC;
将你的子查询移动到 CTE,就像这样(我在火车上 phone 写这篇文章,所以你需要在正确的查询处着陆:-)):
WITH your_max AS (SELECT MAX(request_hour_utc) as foo FROM requests_hours)
SELECT COUNT(request_id)
FROM requests CROSS JOIN your_max
WHERE request_time_utc >= your_max.foo