当此标识符不再存在于数据库中时,特定标识符缺少日期而不添加额外日期 SQL
Missing dates for specific identifiers without adding extra dates when this identifier is no longer in the database SQL
用语言表达问题,我有一个庞大的 table,其中包括订阅者和每天的数据。如果订阅者不再存在,那么他们将没有更多记录,即 SUB123 从 28/10/2021 开始不再存在,那么该订阅者将每天都有记录,直到 27/10/2021。目前的问题是一些订阅者错过了日期,这可能是因为它是周末或其他问题。我想用空值填充这些记录,以便它们可以记录在案。
当前问题:
Subscriber
Date
Rev
sub123
25/10/2021
256
sub456
25/10/2021
282
sub123
26/10/2021
652
sub123
27/10/2021
396
sub456
28/10/2021
132
sub456
29/10/2021
484
sub456
01/11/2021
96
sub456
02/11/2021
45
所需的解决方案:
Subscriber
Date
Rev
sub123
25/10/2021
256
sub456
25/10/2021
282
sub123
26/10/2021
652
sub456
26/10/2021
NULL
sub123
27/10/2021
396
sub456
27/10/2021
NULL
sub456
28/10/2021
132
sub456
29/10/2021
484
sub456
30/10/2021
NULL
sub456
31/10/2021
NULL
sub456
01/11/2021
96
sub456
02/11/2021
45
我目前的尝试:
WITH all_dates as (
SELECT
CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
(VALUES
(SEQUENCE(
min(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
max(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
LEFT JOIN MAINTABLE b
on t2.date_column = b.date
),
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
)
SELECT *
from customer_dates a
此代码不起作用,但它是我尝试完成的尝试,如果我使用下面附带的代码,它将为所有订阅者生成从初始日期到结束日期的日期这不是我们想要的,因此上面的代码就是我们所尝试的。
WITH all_dates as (
SELECT
CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
(VALUES
(SEQUENCE(
date('2021-10-25'),
date('2022-04-30'),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
LEFT JOIN MAINTABLE b
on t2.date_column = b.date
),
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
)
SELECT *
from customer_dates a
您可以使用 lag
函数生成缺失范围以使用 unnest
展平并另外处理 Rev
:
-- sample data
WITH dataset (Subscriber, Date, Rev) AS (
VALUES ('sub123', date_parse('25-10-2021', '%d-%m-%Y'), 256),
('sub456', date_parse('25-10-2021', '%d-%m-%Y'), 282),
('sub123', date_parse('26-10-2021', '%d-%m-%Y'), 652),
('sub123', date_parse('27-10-2021', '%d-%m-%Y'), 396),
('sub456', date_parse('28-10-2021', '%d-%m-%Y'), 132),
('sub456', date_parse('29-10-2021', '%d-%m-%Y'), 484),
('sub456', date_parse('01-11-2021', '%d-%m-%Y'), 96),
('sub456', date_parse('02-11-2021', '%d-%m-%Y'), 45)
)
-- query
select subscriber, lifted_date as date, if(date = lifted_date, rev, NULL) rev
from
(
select Subscriber,
Rev,
cast(date as date) date,
lag(cast(date as date)) over(partition by Subscriber order by date) prev_date
from dataset
)
cross join unnest(
array_except(sequence(coalesce(prev_date, date), date, interval '1' day), array[prev_date])
) as t(lifted_date)
order by subscriber, date
输出:
subscriber
date
rev
sub123
2021-10-25 00:00:00.000
256
sub123
2021-10-26 00:00:00.000
652
sub123
2021-10-27 00:00:00.000
396
sub456
2021-10-25 00:00:00.000
282
sub456
2021-10-26 00:00:00.000
sub456
2021-10-27 00:00:00.000
sub456
2021-10-28 00:00:00.000
132
sub456
2021-10-29 00:00:00.000
484
sub456
2021-10-30 00:00:00.000
sub456
2021-10-31 00:00:00.000
sub456
2021-11-01 00:00:00.000
96
sub456
2021-11-02 00:00:00.000
45
用语言表达问题,我有一个庞大的 table,其中包括订阅者和每天的数据。如果订阅者不再存在,那么他们将没有更多记录,即 SUB123 从 28/10/2021 开始不再存在,那么该订阅者将每天都有记录,直到 27/10/2021。目前的问题是一些订阅者错过了日期,这可能是因为它是周末或其他问题。我想用空值填充这些记录,以便它们可以记录在案。
当前问题:
Subscriber | Date | Rev |
---|---|---|
sub123 | 25/10/2021 | 256 |
sub456 | 25/10/2021 | 282 |
sub123 | 26/10/2021 | 652 |
sub123 | 27/10/2021 | 396 |
sub456 | 28/10/2021 | 132 |
sub456 | 29/10/2021 | 484 |
sub456 | 01/11/2021 | 96 |
sub456 | 02/11/2021 | 45 |
所需的解决方案:
Subscriber | Date | Rev |
---|---|---|
sub123 | 25/10/2021 | 256 |
sub456 | 25/10/2021 | 282 |
sub123 | 26/10/2021 | 652 |
sub456 | 26/10/2021 | NULL |
sub123 | 27/10/2021 | 396 |
sub456 | 27/10/2021 | NULL |
sub456 | 28/10/2021 | 132 |
sub456 | 29/10/2021 | 484 |
sub456 | 30/10/2021 | NULL |
sub456 | 31/10/2021 | NULL |
sub456 | 01/11/2021 | 96 |
sub456 | 02/11/2021 | 45 |
我目前的尝试:
WITH all_dates as (
SELECT
CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
(VALUES
(SEQUENCE(
min(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
max(b.date) OVER (PARTITION BY b.subscriber ORDER BY b.date),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
LEFT JOIN MAINTABLE b
on t2.date_column = b.date
),
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
)
SELECT *
from customer_dates a
此代码不起作用,但它是我尝试完成的尝试,如果我使用下面附带的代码,它将为所有订阅者生成从初始日期到结束日期的日期这不是我们想要的,因此上面的代码就是我们所尝试的。
WITH all_dates as (
SELECT
CAST(date_column AS DATE) date_column, b.subscriber, b.date
FROM
(VALUES
(SEQUENCE(
date('2021-10-25'),
date('2022-04-30'),
INTERVAL '1' DAY)
)
) AS t1(date_array)
CROSS JOIN
UNNEST(date_array) AS t2(date_column)
LEFT JOIN MAINTABLE b
on t2.date_column = b.date
),
customer_dates as (
SELECT distinct a.subscriber, a.date, b.date_column
from MAINTABLE a
left join all_dates b
on a.date = b.date_column
)
SELECT *
from customer_dates a
您可以使用 lag
函数生成缺失范围以使用 unnest
展平并另外处理 Rev
:
-- sample data
WITH dataset (Subscriber, Date, Rev) AS (
VALUES ('sub123', date_parse('25-10-2021', '%d-%m-%Y'), 256),
('sub456', date_parse('25-10-2021', '%d-%m-%Y'), 282),
('sub123', date_parse('26-10-2021', '%d-%m-%Y'), 652),
('sub123', date_parse('27-10-2021', '%d-%m-%Y'), 396),
('sub456', date_parse('28-10-2021', '%d-%m-%Y'), 132),
('sub456', date_parse('29-10-2021', '%d-%m-%Y'), 484),
('sub456', date_parse('01-11-2021', '%d-%m-%Y'), 96),
('sub456', date_parse('02-11-2021', '%d-%m-%Y'), 45)
)
-- query
select subscriber, lifted_date as date, if(date = lifted_date, rev, NULL) rev
from
(
select Subscriber,
Rev,
cast(date as date) date,
lag(cast(date as date)) over(partition by Subscriber order by date) prev_date
from dataset
)
cross join unnest(
array_except(sequence(coalesce(prev_date, date), date, interval '1' day), array[prev_date])
) as t(lifted_date)
order by subscriber, date
输出:
subscriber | date | rev |
---|---|---|
sub123 | 2021-10-25 00:00:00.000 | 256 |
sub123 | 2021-10-26 00:00:00.000 | 652 |
sub123 | 2021-10-27 00:00:00.000 | 396 |
sub456 | 2021-10-25 00:00:00.000 | 282 |
sub456 | 2021-10-26 00:00:00.000 | |
sub456 | 2021-10-27 00:00:00.000 | |
sub456 | 2021-10-28 00:00:00.000 | 132 |
sub456 | 2021-10-29 00:00:00.000 | 484 |
sub456 | 2021-10-30 00:00:00.000 | |
sub456 | 2021-10-31 00:00:00.000 | |
sub456 | 2021-11-01 00:00:00.000 | 96 |
sub456 | 2021-11-02 00:00:00.000 | 45 |