如何使用解析 window SQL 函数在同一数据集的行组中查找 id 值
How to lookup id values in groups of rows of the same dataset using analytic window SQL function
--Dataset Name: Jobs
week date job_id
----------------------
wk1 01/15 300
wk1 01/15 301
wk1 01/15 302
wk2 01/22 300
wk2 01/22 302
wk2 01/22 303
wk2 01/22 304
wk3 01/29 302
wk3 01/29 304
wk3 01/29 305
我有一个像上面这样的数据集。我想创建 3 个附加列,即:
is_job_id_present_in_wk1
is_job_id_present_in_wk2
is_job_id_present_in_wk3
我想编写一个 SQL 查询,为三个新列中的每一列将每一行标记为 1 或 0。我不想使用自连接。我想使用一些解析 window 函数。
例如,对于给定数据集中的第一行,is_job_id_present_in_wk1、is_job_id_present_in_wk2 和 is_job_id_present_in_wk3 的值将为 1(因为存在 job_id 300在所有三周内)。
对于给定数据集中的第二行,is_job_id_present_in_wk1 的值将为 1,is_job_id_present_in_wk2 将为 0,is_job_id_present_in_wk3 将为 0(因为 job_id 301 仅出现在第 1 周)。
尝试到现在:
SELECT week, date, job_id
, CASE WHEN job_id =
FIRST_VALUE(CASE WHEN week='wk1' THEN job_id ELSE NULL END) OVER(ORDER BY job_id rows between current row and current row)
THEN 1 ELSE 0 END as is_job_id_present_in_wk1
FROM jobs;
尝试:
SELECT week, date, job_id,
max( case when week = 'wk1' then 1 else 0 end )
over (partition by job_id) as is_job_id_present_in_wk1,
max( case when week = 'wk2' then 1 else 0 end )
over (partition by job_id) as is_job_id_present_in_wk2,
max( case when week = 'wk3' then 1 else 0 end )
over (partition by job_id) as is_job_id_present_in_wk2
FROM jobs;
也试试这个版本:
SELECT week, date, job_id
, CASE WHEN EXISTS( SELECT 1 FROM jobs job1
WHERE job1.job_id = jobs.job_id AND job1.week = 'wk1' )
THEN 1 ELSE 0 END as is_job_id_present_in_wk1
, CASE WHEN EXISTS( SELECT 1 FROM jobs job1
WHERE job1.job_id = jobs.job_id AND job1.week = 'wk2' )
THEN 1 ELSE 0 END as is_job_id_present_in_wk2
, CASE WHEN EXISTS( SELECT 1 FROM jobs job1
WHERE job1.job_id = jobs.job_id AND job1.week = 'wk3' )
THEN 1 ELSE 0 END as is_job_id_present_in_wk3
FROM jobs;
因为它可能比具有分析功能的版本更快,尤其是当您在 job_id + week 列上创建复合索引时。
--Dataset Name: Jobs
week date job_id
----------------------
wk1 01/15 300
wk1 01/15 301
wk1 01/15 302
wk2 01/22 300
wk2 01/22 302
wk2 01/22 303
wk2 01/22 304
wk3 01/29 302
wk3 01/29 304
wk3 01/29 305
我有一个像上面这样的数据集。我想创建 3 个附加列,即:
is_job_id_present_in_wk1
is_job_id_present_in_wk2
is_job_id_present_in_wk3
我想编写一个 SQL 查询,为三个新列中的每一列将每一行标记为 1 或 0。我不想使用自连接。我想使用一些解析 window 函数。
例如,对于给定数据集中的第一行,is_job_id_present_in_wk1、is_job_id_present_in_wk2 和 is_job_id_present_in_wk3 的值将为 1(因为存在 job_id 300在所有三周内)。
对于给定数据集中的第二行,is_job_id_present_in_wk1 的值将为 1,is_job_id_present_in_wk2 将为 0,is_job_id_present_in_wk3 将为 0(因为 job_id 301 仅出现在第 1 周)。
尝试到现在:
SELECT week, date, job_id
, CASE WHEN job_id =
FIRST_VALUE(CASE WHEN week='wk1' THEN job_id ELSE NULL END) OVER(ORDER BY job_id rows between current row and current row)
THEN 1 ELSE 0 END as is_job_id_present_in_wk1
FROM jobs;
尝试:
SELECT week, date, job_id,
max( case when week = 'wk1' then 1 else 0 end )
over (partition by job_id) as is_job_id_present_in_wk1,
max( case when week = 'wk2' then 1 else 0 end )
over (partition by job_id) as is_job_id_present_in_wk2,
max( case when week = 'wk3' then 1 else 0 end )
over (partition by job_id) as is_job_id_present_in_wk2
FROM jobs;
也试试这个版本:
SELECT week, date, job_id
, CASE WHEN EXISTS( SELECT 1 FROM jobs job1
WHERE job1.job_id = jobs.job_id AND job1.week = 'wk1' )
THEN 1 ELSE 0 END as is_job_id_present_in_wk1
, CASE WHEN EXISTS( SELECT 1 FROM jobs job1
WHERE job1.job_id = jobs.job_id AND job1.week = 'wk2' )
THEN 1 ELSE 0 END as is_job_id_present_in_wk2
, CASE WHEN EXISTS( SELECT 1 FROM jobs job1
WHERE job1.job_id = jobs.job_id AND job1.week = 'wk3' )
THEN 1 ELSE 0 END as is_job_id_present_in_wk3
FROM jobs;
因为它可能比具有分析功能的版本更快,尤其是当您在 job_id + week 列上创建复合索引时。