具有多个条件的 CASE - Teradata/SQL
CASE with multiple condition - Teradata/SQL
我的数据集在 teradata 中看起来像这样:
╔═══════════╦══════════╦══════╗
║ studentid ║ date ║ days ║
╠═══════════╬══════════╬══════╣
║ 1000 ║ 2/1/2017 ║ 25 ║
║ 1000 ║ 3/8/2017 ║ 30 ║
║ 1000 ║ 4/4/2017 ║ 80 ║
║ 1000 ║ 5/1/2017 ║ 81 ║
║ 1001 ║ 1/1/2017 ║ 60 ║
║ 1001 ║ 2/1/2017 ║ 20 ║
║ 1001 ║ 4/1/2017 ║ 81 ║
╚═══════════╩══════════╩══════╝
我想要一个新列(标志),如果最近的两个日期是 80 或 81,则应在行上指示 1。如果不是 0。
对于学生1001,所有行都应该为0,因为最后两个日期不是80或81。它需要取最后两个日期。尽管 1001 有 81,但倒数第二个日期有 20,因此两者的标志都需要为 0
期望的输出:
╔═══════════╦══════════╦══════╦══════╗
║ studentid ║ date ║ days ║ flag ║
╠═══════════╬══════════╬══════╬══════╣
║ 1000 ║ 2/1/2017 ║ 25 ║ 0 ║
║ 1000 ║ 3/8/2017 ║ 30 ║ 0 ║
║ 1000 ║ 4/4/2017 ║ 80 ║ 1 ║
║ 1000 ║ 5/1/2017 ║ 81 ║ 1 ║
║ 1001 ║ 1/1/2017 ║ 60 ║ 0 ║
║ 1001 ║ 2/1/2017 ║ 20 ║ 0 ║
║ 1001 ║ 4/1/2017 ║ 81 ║ 0 ║
╚═══════════╩══════════╩══════╩══════╝
用row_number
分配行号,然后得到每个studentid最后两行的min
和max
值。此后,使用 case
表达式检查条件以分配标志。
select studentid,dt,days
,case when rnum in (1,2) and max_days_latest_2 in (80,81) and min_days_latest_2 in (80,81) then 1 else 0 end as flag
from (select t.*
,max(case when rnum in (1,2) then days end) over(partition by studentid) as max_days_latest_2
,min(case when rnum in (1,2) then days end) over(partition by studentid) as min_days_latest_2
from (select t.*,row_number() over(partition by studentid order by dt desc) as rnum
from tbl t
) t
) t
对于前两行,您可以应用一个简单的逻辑,这将在 Explain
中产生一个 STAT-step
如果当前行是第一行:检查这一行和下一行是否都包含这些值之一
如果当前行是第二行:检查这一行和上一行是否都包含这些值之一
SELECT studentid, date_, Days,
CASE Row_Number()
Over (PARTITION BY studentid
ORDER BY date DESC)
WHEN 1
THEN CASE WHEN Days IN (80,81)
-- AND Min(Days) Over (PARTITION BY studentid ORDER BY date DESC ROWS BETWEEN 1 Following AND 1 Following) IN (80,81)
AND Lead(Days) Over (PARTITION BY studentid ORDER BY date DESC) IN (80,81)
THEN 1
ELSE 0
END
WHEN 2
THEN CASE WHEN Days IN (80,81)
-- AND Min(Days) Over (PARTITION BY studentid ORDER BY date DESC ROWS BETWEEN 1 Preceding AND 1 Preceding) IN (80,81)
AND Lag(Days) Over (PARTITION BY studentid ORDER BY date DESC) IN (80,81)
THEN 1
ELSE 0
END
ELSE 0
END AS flag
FROM tab
如果您的 Teradata 版本不支持 lead
/lag
,请改用 min
语法。
但是如果您需要将此逻辑应用于 >2 行,则需要更通用的方法:
SELECT studentid, date, Days,
-- check if the first n rows contain only searched values
CASE WHEN x IS NOT NULL THEN Min(x) Over (PARTITION BY studentid) ELSE 0 END AS flag
FROM
(
SELECT studentid, date_, Days,
CASE
WHEN Row_Number()
Over (PARTITION BY studentid
ORDER BY date DESC) BETWEEN 1 AND 2 -- only for the first n days
THEN CASE WHEN Days IN (80,81) THEN 1 ELSE 0 END -- flag the searched values
END AS x
FROM tab AS t
) AS dt
我的数据集在 teradata 中看起来像这样:
╔═══════════╦══════════╦══════╗
║ studentid ║ date ║ days ║
╠═══════════╬══════════╬══════╣
║ 1000 ║ 2/1/2017 ║ 25 ║
║ 1000 ║ 3/8/2017 ║ 30 ║
║ 1000 ║ 4/4/2017 ║ 80 ║
║ 1000 ║ 5/1/2017 ║ 81 ║
║ 1001 ║ 1/1/2017 ║ 60 ║
║ 1001 ║ 2/1/2017 ║ 20 ║
║ 1001 ║ 4/1/2017 ║ 81 ║
╚═══════════╩══════════╩══════╝
我想要一个新列(标志),如果最近的两个日期是 80 或 81,则应在行上指示 1。如果不是 0。
对于学生1001,所有行都应该为0,因为最后两个日期不是80或81。它需要取最后两个日期。尽管 1001 有 81,但倒数第二个日期有 20,因此两者的标志都需要为 0
期望的输出:
╔═══════════╦══════════╦══════╦══════╗
║ studentid ║ date ║ days ║ flag ║
╠═══════════╬══════════╬══════╬══════╣
║ 1000 ║ 2/1/2017 ║ 25 ║ 0 ║
║ 1000 ║ 3/8/2017 ║ 30 ║ 0 ║
║ 1000 ║ 4/4/2017 ║ 80 ║ 1 ║
║ 1000 ║ 5/1/2017 ║ 81 ║ 1 ║
║ 1001 ║ 1/1/2017 ║ 60 ║ 0 ║
║ 1001 ║ 2/1/2017 ║ 20 ║ 0 ║
║ 1001 ║ 4/1/2017 ║ 81 ║ 0 ║
╚═══════════╩══════════╩══════╩══════╝
用row_number
分配行号,然后得到每个studentid最后两行的min
和max
值。此后,使用 case
表达式检查条件以分配标志。
select studentid,dt,days
,case when rnum in (1,2) and max_days_latest_2 in (80,81) and min_days_latest_2 in (80,81) then 1 else 0 end as flag
from (select t.*
,max(case when rnum in (1,2) then days end) over(partition by studentid) as max_days_latest_2
,min(case when rnum in (1,2) then days end) over(partition by studentid) as min_days_latest_2
from (select t.*,row_number() over(partition by studentid order by dt desc) as rnum
from tbl t
) t
) t
对于前两行,您可以应用一个简单的逻辑,这将在 Explain
中产生一个 STAT-step如果当前行是第一行:检查这一行和下一行是否都包含这些值之一
如果当前行是第二行:检查这一行和上一行是否都包含这些值之一
SELECT studentid, date_, Days,
CASE Row_Number()
Over (PARTITION BY studentid
ORDER BY date DESC)
WHEN 1
THEN CASE WHEN Days IN (80,81)
-- AND Min(Days) Over (PARTITION BY studentid ORDER BY date DESC ROWS BETWEEN 1 Following AND 1 Following) IN (80,81)
AND Lead(Days) Over (PARTITION BY studentid ORDER BY date DESC) IN (80,81)
THEN 1
ELSE 0
END
WHEN 2
THEN CASE WHEN Days IN (80,81)
-- AND Min(Days) Over (PARTITION BY studentid ORDER BY date DESC ROWS BETWEEN 1 Preceding AND 1 Preceding) IN (80,81)
AND Lag(Days) Over (PARTITION BY studentid ORDER BY date DESC) IN (80,81)
THEN 1
ELSE 0
END
ELSE 0
END AS flag
FROM tab
如果您的 Teradata 版本不支持 lead
/lag
,请改用 min
语法。
但是如果您需要将此逻辑应用于 >2 行,则需要更通用的方法:
SELECT studentid, date, Days,
-- check if the first n rows contain only searched values
CASE WHEN x IS NOT NULL THEN Min(x) Over (PARTITION BY studentid) ELSE 0 END AS flag
FROM
(
SELECT studentid, date_, Days,
CASE
WHEN Row_Number()
Over (PARTITION BY studentid
ORDER BY date DESC) BETWEEN 1 AND 2 -- only for the first n days
THEN CASE WHEN Days IN (80,81) THEN 1 ELSE 0 END -- flag the searched values
END AS x
FROM tab AS t
) AS dt