在以后的查询中使用来自 CTE 的 min/max 值,而不是在 Postgres 中使用子查询

Using min/max values from a CTE in a later query, instead of using a subquery in Postgres

我有一个关于在查询的后面部分从 CTE 中提取结果的补救性问题。对于示例代码,下面是相关的,精简的 tables:

CREATE TABLE print_job (
   created_dts  timestamp  not null default now(),
   status       text       not null
);

CREATE TABLE calendar_day (
   date_actual date not null
);

在当前设置中,print_job 数据中的日期存在间隙,我们希望得到一个没有间隙的结果。比如table中从第一个日期到最后一个日期有87天,里面只有77天有数据。我们已经有一个 calendar_day 维度 table 可以加入以获得 87 天范围内的 87 行。使用子查询或 CTE 很容易找出数据中的 minmax 日期,但我不知道如何使用 CTE 中的这些值。我在下面有一个完整的查询,但这里是带有评论的相关片段:

-- Get the date range from the data.
date_range AS (
    select min(created_dts::date) AS start_date,
           max(created_dts::date) AS end_date
           
   from print_job),
   
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
     select actual_date
     
       from calendar_day

   where actual_date >= date_range.start_date  
     and actual_date <= date_range.end_date
),

-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (

   select date_actual
     
     from calendar_day

    where date_actual >= (select min(created_dts::date) from print_job)
      and date_actual <= (select max(created_dts::date) from print_job)   
)

我 运行 经常参与其中,最后想问一下。我已经四处寻找答案,但我不清楚如何很好地总结它。虽然在这种情况下子查询没有任何问题,但我遇到了 CTE nicer/more 可读的其他情况。

如果有帮助,我在下面列出了完整的查询。

-- Get some counts and give them names.
WITH 
daily_status AS (
    select created_dts::date as created_date,
           count(*)                                        AS daily_total,
           count(*) FILTER (where status = 'Error')        AS status_error,
           count(*) FILTER (where status = 'Processing')   AS status_processing,
           count(*) FILTER (where status = 'Aborted')      AS status_aborted,
           count(*) FILTER (where status = 'Done')         AS status_done

     from print_job

     group by created_dts::date
),

-- Get the date range from the data.
date_range AS (
    select min(created_dts::date) AS start_date,
           max(created_dts::date) AS end_date
           
   from print_job),
   
-- There are gaps in the data, and we want a row for dates with no results.  
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date. 

-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
--      select actual_date
--      
--        from calendar_day
-- 
--    where actual_date >= date_range.start_date  
--      and actual_date <= date_range.end_date
-- ),

complete_date_series_using_subquery AS (

   select date_actual
     
     from calendar_day

    where date_actual >= (select min(created_dts::date) from print_job)
      and date_actual <= (select max(created_dts::date) from print_job)
     
)

-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
       coalesce(daily_total,0)        AS total, 
       coalesce(status_error,0)       AS errors,
       coalesce(status_processing,0)  AS processing,
       coalesce(status_aborted,0)     AS aborted,
       coalesce(status_done,0)        AS done
       
   from complete_date_series_using_subquery
left join daily_status
       on daily_status.created_date  = 
          complete_date_series_using_subquery.date_actual

    order by date_actual

我说这是一个补救题....我记得我以前在哪里看到过这个:

https://tapoueh.org/manual-post/2014/02/postgresql-histogram/

在我的示例中,我需要在 table 列表中列出 CTE。回想起来这很明显,我意识到我自动 想这样做,因为我习惯性地避免 CROSS JOIN。下面的片段显示了所需的细微变化:

WITH
date_range AS ( 

select min(created_dts)::date as start_date,
       max(created_dts)::date as end_date
       
  from print_job
),        

complete_date_series AS ( 

select date_actual 
  from calendar_day, date_range
 where date_actual >= date_range.start_date   
   and date_actual <= date_range.end_date
),