PostgreSQL 子查询案例顺序扫描

PostgreSQL subquery case sequential scan

我有 2 个表:

  1. requests:内容6000万条记录(用作网站日志)

  2. requests_hours:内容几百行(从requests表中每分钟更新一次)

我有以下简单的查询,但是当我执行它时,大约需要 5 分钟才能完成,因为 Postgres 不使用 request_time_utc 列的索引,只是进行顺序扫描。

SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)

但是如果我只是删除子查询(它本身在 0.003 秒内执行)并将其替换为如下所示的静态值,我将在仅 0.008 秒内执行以下查询:

SELECT COUNT(request_id)
FROM requests
WHERE request_time_utc >= '2019-09-30 17:00:00'

查询应该每分钟只计算几行,从 1000 到 7000,所以对列 request_time_utc 的索引扫描肯定比顺序扫描好得多。

我不明白如何强制 PostgreSQL 对第一个查询进行索引扫描。

以上查询是为了简化问题;这是原始的:

SELECT 
            customer_id,
            DATE_TRUNC('hour', request_time_utc) AS request_hour_utc,
            COUNT(request_id) AS total_requests,
            SUM(data_in_size) AS total_data_in_size,
            SUM(data_out_size) AS total_data_out_size,
            SUM(process_long) AS total_process_long
            FROM requests
            WHERE request_time_utc >= (SELECT MAX(request_hour_utc) FROM requests_hours)
                        AND customer_id IS NOT NULL
            GROUP BY request_hour_utc , customer_id
            ORDER BY request_hour_utc DESC;

将你的子查询移动到 CTE,就像这样(我在火车上 phone 写这篇文章,所以你需要在正确的查询处着陆:-)):

WITH your_max AS (SELECT MAX(request_hour_utc) as foo FROM requests_hours)
SELECT COUNT(request_id)
FROM requests CROSS JOIN your_max
WHERE request_time_utc >= your_max.foo