当此标识符不再存在于数据库中时,特定标识符缺少日期而不添加额外日期 SQL

Missing dates for specific identifiers without adding extra dates when this identifier is no longer in the database SQL

用语言表达问题,我有一个庞大的 table,其中包括订阅者和每天的数据。如果订阅者不再存在,那么他们将没有更多记录,即 SUB123 从 28/10/2021 开始不再存在,那么该订阅者将每天都有记录,直到 27/10/2021。目前的问题是一些订阅者错过了日期,这可能是因为它是周末或其他问题。我想用空值填充这些记录,以便它们可以记录在案。

当前问题:

Subscriber Date Rev
sub123 25/10/2021 256
sub456 25/10/2021 282
sub123 26/10/2021 652
sub123 27/10/2021 396
sub456 28/10/2021 132
sub456 29/10/2021 484
sub456 01/11/2021 96
sub456 02/11/2021 45

所需的解决方案:

Subscriber Date Rev
sub123 25/10/2021 256
sub456 25/10/2021 282
sub123 26/10/2021 652
sub456 26/10/2021 NULL
sub123 27/10/2021 396
sub456 27/10/2021 NULL
sub456 28/10/2021 132
sub456 29/10/2021 484
sub456 30/10/2021 NULL
sub456 31/10/2021 NULL
sub456 01/11/2021 96
sub456 02/11/2021 45

我目前的尝试:

WITH all_dates as (
SELECT
     CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
 (VALUES
     (SEQUENCE(
      min(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
      max(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
      INTERVAL '1' DAY)
     )
 ) AS t1(date_array)
CROSS JOIN
 UNNEST(date_array) AS t2(date_column) 
LEFT JOIN MAINTABLE b 
on t2.date_column = b.date
), 
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
    )
    SELECT *
    from customer_dates a

此代码不起作用,但它是我尝试完成的尝试,如果我使用下面附带的代码,它将为所有订阅者生成从初始日期到结束日期的日期这不是我们想要的,因此上面的代码就是我们所尝试的。

WITH all_dates as (
SELECT
     CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
 (VALUES
     (SEQUENCE(
      date('2021-10-25'),
      date('2022-04-30'),
      INTERVAL '1' DAY)
     )
 ) AS t1(date_array)
CROSS JOIN
 UNNEST(date_array) AS t2(date_column) 
LEFT JOIN MAINTABLE b 
on t2.date_column = b.date
), 
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
    )
    SELECT *
    from customer_dates a

您可以使用 lag 函数生成缺失范围以使用 unnest 展平并另外处理 Rev

-- sample data
WITH dataset (Subscriber, Date, Rev) AS (
    VALUES ('sub123',   date_parse('25-10-2021', '%d-%m-%Y'),   256),
    ('sub456',  date_parse('25-10-2021', '%d-%m-%Y'),   282),
    ('sub123',  date_parse('26-10-2021', '%d-%m-%Y'),   652),
    ('sub123',  date_parse('27-10-2021', '%d-%m-%Y'),   396),
    ('sub456',  date_parse('28-10-2021', '%d-%m-%Y'),   132),
    ('sub456',  date_parse('29-10-2021', '%d-%m-%Y'),   484),
    ('sub456',  date_parse('01-11-2021', '%d-%m-%Y'),   96),
    ('sub456',  date_parse('02-11-2021', '%d-%m-%Y'),   45)
) 

-- query
select subscriber, lifted_date as date, if(date = lifted_date, rev, NULL) rev
from
(
    select Subscriber, 
        Rev,
        cast(date as date) date, 
        lag(cast(date as date)) over(partition by Subscriber order by date) prev_date
    from dataset
)
cross join unnest(
        array_except(sequence(coalesce(prev_date, date), date, interval '1' day), array[prev_date])
    ) as t(lifted_date)
order by subscriber, date

输出:

subscriber date rev
sub123 2021-10-25 00:00:00.000 256
sub123 2021-10-26 00:00:00.000 652
sub123 2021-10-27 00:00:00.000 396
sub456 2021-10-25 00:00:00.000 282
sub456 2021-10-26 00:00:00.000
sub456 2021-10-27 00:00:00.000
sub456 2021-10-28 00:00:00.000 132
sub456 2021-10-29 00:00:00.000 484
sub456 2021-10-30 00:00:00.000
sub456 2021-10-31 00:00:00.000
sub456 2021-11-01 00:00:00.000 96
sub456 2021-11-02 00:00:00.000 45