具有多个条件的 CASE - Teradata/SQL

CASE with multiple condition - Teradata/SQL

我的数据集在 teradata 中看起来像这样:

╔═══════════╦══════════╦══════╗
║ studentid ║   date   ║ days ║
╠═══════════╬══════════╬══════╣
║      1000 ║ 2/1/2017 ║   25 ║
║      1000 ║ 3/8/2017 ║   30 ║
║      1000 ║ 4/4/2017 ║   80 ║
║      1000 ║ 5/1/2017 ║   81 ║
║      1001 ║ 1/1/2017 ║   60 ║
║      1001 ║ 2/1/2017 ║   20 ║
║      1001 ║ 4/1/2017 ║   81 ║
╚═══════════╩══════════╩══════╝

我想要一个新列(标志),如果最近的两个日期是 80 或 81,则应在行上指示 1。如果不是 0。

对于学生1001,所有行都应该为0,因为最后两个日期不是80或81。它需要取最后两个日期。尽管 1001 有 81,但倒数第二个日期有 20,因此两者的标志都需要为 0

期望的输出:

╔═══════════╦══════════╦══════╦══════╗
║ studentid ║   date   ║ days ║ flag ║
╠═══════════╬══════════╬══════╬══════╣
║      1000 ║ 2/1/2017 ║   25 ║    0 ║
║      1000 ║ 3/8/2017 ║   30 ║    0 ║
║      1000 ║ 4/4/2017 ║   80 ║    1 ║
║      1000 ║ 5/1/2017 ║   81 ║    1 ║
║      1001 ║ 1/1/2017 ║   60 ║    0 ║
║      1001 ║ 2/1/2017 ║   20 ║    0 ║
║      1001 ║ 4/1/2017 ║   81 ║    0 ║
╚═══════════╩══════════╩══════╩══════╝

row_number分配行号,然后得到每个studentid最后两行的minmax值。此后,使用 case 表达式检查条件以分配标志。

select studentid,dt,days
,case when rnum in (1,2) and max_days_latest_2 in (80,81) and min_days_latest_2 in (80,81) then 1 else 0 end as flag
from (select t.*
      ,max(case when rnum in (1,2) then days end) over(partition by studentid) as max_days_latest_2
      ,min(case when rnum in (1,2) then days end) over(partition by studentid) as min_days_latest_2
      from (select t.*,row_number() over(partition by studentid order by dt desc) as rnum
            from tbl t
           ) t
     ) t

对于前两行,您可以应用一个简单的逻辑,这将在 Explain

中产生一个 STAT-step

如果当前行是第一行:检查这一行和下一行是否都包含这些值之一

如果当前行是第二行:检查这一行和上一行是否都包含这些值之一

SELECT studentid, date_, Days,
   CASE Row_Number()
        Over (PARTITION BY studentid
              ORDER BY date DESC)
      WHEN 1 
         THEN CASE WHEN Days IN (80,81)
--                    AND Min(Days) Over (PARTITION BY studentid ORDER BY date DESC ROWS BETWEEN 1 Following AND 1 Following) IN (80,81)
                    AND Lead(Days) Over (PARTITION BY studentid ORDER BY date DESC) IN (80,81)
                   THEN 1
                   ELSE 0
              END
      WHEN 2
         THEN CASE WHEN Days IN (80,81) 
--                    AND Min(Days) Over (PARTITION BY studentid ORDER BY date DESC ROWS BETWEEN 1 Preceding AND 1 Preceding) IN (80,81)
                    AND Lag(Days) Over (PARTITION BY studentid ORDER BY date DESC) IN (80,81)
                   THEN 1
                   ELSE 0
              END
      ELSE 0
   END AS flag
FROM tab

如果您的 Teradata 版本不支持 lead/lag,请改用 min 语法。

但是如果您需要将此逻辑应用于 >2 行,则需要更通用的方法:

SELECT studentid, date, Days,
   -- check if the first n rows contain only searched values
   CASE WHEN x IS NOT NULL THEN Min(x) Over (PARTITION BY studentid) ELSE 0 END AS flag
FROM
 (
   SELECT studentid, date_, Days,
      CASE
         WHEN Row_Number()
              Over (PARTITION BY studentid
                    ORDER BY date DESC) BETWEEN 1 AND 2   -- only for the first n days
         THEN CASE WHEN Days IN (80,81) THEN 1 ELSE 0 END  -- flag the searched values
      END AS x
   FROM tab AS t
 ) AS dt