抓取不晚于另一列日期的最新日期记录 SQL BigQuery
grab the latest date record that is not later than date in another column SQL BigQuery
我在 BQ table 中有两个日期列。 pageview_date
和 edited_date
,以及 id
列。我需要逐行输出数据,对于每条记录,我想从 edited_date
列中获取一个值,该值是该列中的最新日期,但不晚于 pageview_date
值本身。如果两个日期相等,则保持原样。它还必须与 ID 相对应。数据如下所示:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/02/22
A 03/01/22 02/02/22
B 03/01/22 01/01/22
B 03/01/22 01/01/22
B 03/01/22 01/31/22
C 03/01/22 04/01/22
C 03/01/22 03/25/22
C 03/01/22 03/01/22
期望的输出是:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/28/22
A 03/01/22 02/28/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
一种方法是在由 id
:
分区的 edited_date
列中使用 MAX
window 函数
with sample as (
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-02-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
)
SELECT
id,
pageview_date,
MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (PARTITION BY id) as new_edited_date
FROM sample
请注意,如果 pageview_date 之前没有 edited_date
,则 new_edited_date
将是 null
。
我在 BQ table 中有两个日期列。 pageview_date
和 edited_date
,以及 id
列。我需要逐行输出数据,对于每条记录,我想从 edited_date
列中获取一个值,该值是该列中的最新日期,但不晚于 pageview_date
值本身。如果两个日期相等,则保持原样。它还必须与 ID 相对应。数据如下所示:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/02/22
A 03/01/22 02/02/22
B 03/01/22 01/01/22
B 03/01/22 01/01/22
B 03/01/22 01/31/22
C 03/01/22 04/01/22
C 03/01/22 03/25/22
C 03/01/22 03/01/22
期望的输出是:
id pageview_date edited_date
A 03/01/22 02/28/22
A 03/01/22 02/28/22
A 03/01/22 02/28/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
B 03/01/22 01/31/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
C 03/01/22 03/01/22
一种方法是在由 id
:
edited_date
列中使用 MAX
window 函数
with sample as (
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-02-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
)
SELECT
id,
pageview_date,
MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (PARTITION BY id) as new_edited_date
FROM sample
请注意,如果 pageview_date 之前没有 edited_date
,则 new_edited_date
将是 null
。