在以后的查询中使用来自 CTE 的 min/max 值,而不是在 Postgres 中使用子查询
Using min/max values from a CTE in a later query, instead of using a subquery in Postgres
我有一个关于在查询的后面部分从 CTE 中提取结果的补救性问题。对于示例代码,下面是相关的,精简的 tables:
CREATE TABLE print_job (
created_dts timestamp not null default now(),
status text not null
);
CREATE TABLE calendar_day (
date_actual date not null
);
在当前设置中,print_job
数据中的日期存在间隙,我们希望得到一个没有间隙的结果。比如table中从第一个日期到最后一个日期有87天,里面只有77天有数据。我们已经有一个 calendar_day 维度 table 可以加入以获得 87 天范围内的 87 行。使用子查询或 CTE 很容易找出数据中的 min
和 max
日期,但我不知道如何使用 CTE 中的这些值。我在下面有一个完整的查询,但这里是带有评论的相关片段:
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
select actual_date
from calendar_day
where actual_date >= date_range.start_date
and actual_date <= date_range.end_date
),
-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
我 运行 经常参与其中,最后想问一下。我已经四处寻找答案,但我不清楚如何很好地总结它。虽然在这种情况下子查询没有任何问题,但我遇到了 CTE nicer/more 可读的其他情况。
如果有帮助,我在下面列出了完整的查询。
-- Get some counts and give them names.
WITH
daily_status AS (
select created_dts::date as created_date,
count(*) AS daily_total,
count(*) FILTER (where status = 'Error') AS status_error,
count(*) FILTER (where status = 'Processing') AS status_processing,
count(*) FILTER (where status = 'Aborted') AS status_aborted,
count(*) FILTER (where status = 'Done') AS status_done
from print_job
group by created_dts::date
),
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- There are gaps in the data, and we want a row for dates with no results.
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date.
-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
-- select actual_date
--
-- from calendar_day
--
-- where actual_date >= date_range.start_date
-- and actual_date <= date_range.end_date
-- ),
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
coalesce(daily_total,0) AS total,
coalesce(status_error,0) AS errors,
coalesce(status_processing,0) AS processing,
coalesce(status_aborted,0) AS aborted,
coalesce(status_done,0) AS done
from complete_date_series_using_subquery
left join daily_status
on daily_status.created_date =
complete_date_series_using_subquery.date_actual
order by date_actual
我说这是一个补救题....我记得我以前在哪里看到过这个:
https://tapoueh.org/manual-post/2014/02/postgresql-histogram/
在我的示例中,我需要在 table 列表中列出 CTE。回想起来这很明显,我意识到我自动 不 想这样做,因为我习惯性地避免 CROSS JOIN
。下面的片段显示了所需的细微变化:
WITH
date_range AS (
select min(created_dts)::date as start_date,
max(created_dts)::date as end_date
from print_job
),
complete_date_series AS (
select date_actual
from calendar_day, date_range
where date_actual >= date_range.start_date
and date_actual <= date_range.end_date
),
我有一个关于在查询的后面部分从 CTE 中提取结果的补救性问题。对于示例代码,下面是相关的,精简的 tables:
CREATE TABLE print_job (
created_dts timestamp not null default now(),
status text not null
);
CREATE TABLE calendar_day (
date_actual date not null
);
在当前设置中,print_job
数据中的日期存在间隙,我们希望得到一个没有间隙的结果。比如table中从第一个日期到最后一个日期有87天,里面只有77天有数据。我们已经有一个 calendar_day 维度 table 可以加入以获得 87 天范围内的 87 行。使用子查询或 CTE 很容易找出数据中的 min
和 max
日期,但我不知道如何使用 CTE 中的这些值。我在下面有一个完整的查询,但这里是带有评论的相关片段:
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
select actual_date
from calendar_day
where actual_date >= date_range.start_date
and actual_date <= date_range.end_date
),
-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
我 运行 经常参与其中,最后想问一下。我已经四处寻找答案,但我不清楚如何很好地总结它。虽然在这种情况下子查询没有任何问题,但我遇到了 CTE nicer/more 可读的其他情况。
如果有帮助,我在下面列出了完整的查询。
-- Get some counts and give them names.
WITH
daily_status AS (
select created_dts::date as created_date,
count(*) AS daily_total,
count(*) FILTER (where status = 'Error') AS status_error,
count(*) FILTER (where status = 'Processing') AS status_processing,
count(*) FILTER (where status = 'Aborted') AS status_aborted,
count(*) FILTER (where status = 'Done') AS status_done
from print_job
group by created_dts::date
),
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- There are gaps in the data, and we want a row for dates with no results.
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date.
-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
-- select actual_date
--
-- from calendar_day
--
-- where actual_date >= date_range.start_date
-- and actual_date <= date_range.end_date
-- ),
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
coalesce(daily_total,0) AS total,
coalesce(status_error,0) AS errors,
coalesce(status_processing,0) AS processing,
coalesce(status_aborted,0) AS aborted,
coalesce(status_done,0) AS done
from complete_date_series_using_subquery
left join daily_status
on daily_status.created_date =
complete_date_series_using_subquery.date_actual
order by date_actual
我说这是一个补救题....我记得我以前在哪里看到过这个:
https://tapoueh.org/manual-post/2014/02/postgresql-histogram/
在我的示例中,我需要在 table 列表中列出 CTE。回想起来这很明显,我意识到我自动 不 想这样做,因为我习惯性地避免 CROSS JOIN
。下面的片段显示了所需的细微变化:
WITH
date_range AS (
select min(created_dts)::date as start_date,
max(created_dts)::date as end_date
from print_job
),
complete_date_series AS (
select date_actual
from calendar_day, date_range
where date_actual >= date_range.start_date
and date_actual <= date_range.end_date
),