使用 CASE 语句比较上一行和当前行之间的 TIMESTAMP 数据,并根据 Teradata 的差异进行操作
Comparing TIMESTAMP data between previous row & current row using CASE statement and do operations according to the difference Teradata
我需要帮助使用 Teradata(版本:16.0+)OLAP 函数构建查询,并在以下情况下比较和消除 Teradata 中的汇总重复项 table。
我在 table ABC 中有以下 9 条记录。
Existing Data(Table - ABC)
ACCOUNT_ID EXT_REF_NO SERIAL_NUM RECORD_START_DT RECORD_END_DT
1 100000000002195 8495752450757852 341FE4E6A1AF 8/13/2019 12:24:42 8/20/2019 23:59:59
2 100000000002195 8495752450757852 342FE4E6A1AF 8/21/2019 08:49:08 8/25/2019 23:59:59
3 100000000002195 8495752450757852 343FE4E6A1AF 8/27/2019 02:42:46 8/26/2019 23:59:59
4 100000000002195 8495752450757852 344FE4E6A1AF 8/28/2019 06:33:50 8/28/2019 23:59:59
5 100000000002195 8495752450757852 345FE4E6A1AF 8/30/2019 02:35:32 8/31/2019 23:59:59
6 100000000002195 8495752450757852 346FE4E6A1AF 9/2/2019 00:25:05 9/1/2019 23:59:59
7 100000000002195 8495752450757852 347FE4E6A1AF 9/3/2019 03:33:28 9/3/2019 23:59:59
8 100000000002195 8495752450757852 348FE4E6A1AF 9/4/2019 18:35:45 9/8/2019 23:59:59
9 100000000002195 8495752450757852 349FE4E6A1AF 9/10/2019 11:22:54 3/16/2020 23:59:59
Output
ACCOUNT_ID EXT_REF_NO SERIAL_NUM RECORD_START_DT RECORD_END_DT
1 100000000002195 8495752450757852 341FE4E6A1AF 8/13/2019 12:24:42 8/26/2019 23:59:59
2 100000000002195 8495752450757852 342FE4E6A1AF 8/28/2019 06:33:50 8/28/2019 23:59:59
3 100000000002195 8495752450757852 343FE4E6A1AF 8/30/2019 02:35:32 9/1/2019 23:59:59
4 100000000002195 8495752450757852 345FE4E6A1AF 9/3/2019 03:33:28 9/8/2019 23:59:59
5 100000000002195 8495752450757852 346FE4E6A1AF 9/10/2019 11:22:54 3/16/2020 23:59:59
RECORD_END_DT 应该总是大于 RECORD_START_DT
如果差异大于 1,我们将只考虑汇总当前行的 Record_start_dt = 前一行的(RECORD_END_DT + 1 天)的记录不考虑的日子
您可以找到第 3 行和第 6 行的违反点 1,这实际上是在完成数据输入时同一天过期记录的错误,您实际上可以将 RECORD_START_DT 视为8/26/2019 00:00:00 & 9/2/2019 00:00:00 分别用于第 3 行和第 6 行计算
ACCOUNT_ID,EXT_REF_NO,SERIAL_NUM 所有 3 个都应考虑按
进行分区
我试过类似下面的方法。仅获得最小值 DEVICE_START_DATE 和最大值 DEVICE_END_DATE 的一行输出,如下所示:
ACCOUNT_ID EXT_REF SERIAL_NUM DEVICE_START_DATE DEVICE_END_DATE
100000000002195 8495752450757852 341FE4E6A1AF 8/13/2017 12:24:42.000000 9/16/2017 23:59:59.000000
Query: SELECT
ACCOUNT_ID,
EXT_REF,
SERIAL_NUM,
CASE WHEN (B.DIFF_DAYS <= 1 OR B.DIFF_DAYS IS NULL) THEN
min(DEVICE_START_DATE)
OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by
DEVICE_END_DATE desc)
WHEN (B.DIFF_DAYS > 1 ) THEN
min(DEVICE_START_DATE)
OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by
DEVICE_END_DATE desc)
END AS DEVICE_START_DATE,
DEVICE_END_DATE
FROM
(SELECT A.ACCOUNT_ID,
A.EXT_REF,
A.SERIAL_NUM,
A.DEVICE_START_DATE,
A.DEVICE_START_DATE_VIRTUAL,
A.DEVICE_END_DATE,
MIN(A.DEVICE_END_DATE)
OVER ( PARTITION BY A.ACCOUNT_ID,A.EXT_REF,A.SERIAL_NUM ORDER BY A.DEVICE_END_DATE
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS DEVICE_END_DATE_PREVIOUS_ROW,
TRUNC(A.DEVICE_START_DATE_VIRTUAL) - TRUNC(DEVICE_END_DATE_PREVIOUS_ROW) AS DIFF_DAYS
FROM
(SELECT
ACCOUNT_ID,
EXT_REF,
SERIAL_NUM,
DEVICE_START_DATE,
CASE WHEN DEVICE_START_DATE > DEVICE_END_DATE
THEN (DEVICE_START_DATE - INTERVAL '1' DAY)
ELSE DEVICE_START_DATE END AS DEVICE_START_DATE_VIRTUAL,
DEVICE_END_DATE
FROM NDW_XH_TEMP_TABLES.TEST) A) B
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by DEVICE_END_DATE desc) = 1;
您需要嵌套的 OLAP 函数,这应该可以按预期工作:
SELECT
ACCOUNT_ID
,EXT_REF_NO
,SERIAL_NUM
,Coalesce(Lag(next_start)
Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
ORDER BY next_start NULLS LAST)
,min_start) AS RECORD_START_DT
-- If your Teradata version doesn't support LAG/LEAD you must switch to the MAX version
-- ,Coalesce(Max(next_start)
-- Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
-- ORDER BY next_start NULLS LAST
-- ROWS BETWEEN 1 Preceding AND 1 Preceding)
-- ,min_start) AS RECORD_START_DT
,RECORD_END_DT
FROM
(
SELECT
ACCOUNT_ID
,EXT_REF_NO
,SERIAL_NUM
,RECORD_START_DT
,RECORD_END_DT
-- to check for a gap
,Lag(fixed_start)
Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
ORDER BY fixed_start DESC) AS next_start
-- ,Max(fixed_start)
-- Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
-- ORDER BY fixed_start DESC
-- ROWS BETWEEN 1 Preceding AND 1 Preceding) AS next_start
-- used in the outer COALESCE to get the min start for the 1st group
,Min(RECORD_START_DT)
Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO) AS min_start
-- gap detection
,CASE WHEN Cast(RECORD_END_DT AS DATE) + 1 = Cast(next_start AS DATE) THEN 0 ELSE 1 END AS flag
FROM
( -- fixing the bad data first
SELECT t.*
,CASE WHEN RECORD_START_DT > RECORD_END_DT THEN RECORD_START_DT - INTERVAL '1' DAY ELSE RECORD_START_DT END AS fixed_start
FROM tab AS t
) AS fixed_data
QUALIFY flag = 1
) AS dt
这会寻找间隙,在应用 flag = 1 后,当前行获得最大结束日期,前一行获得匹配的开始日期。外层 Select 最终将此开始日期添加到当前行。
我需要帮助使用 Teradata(版本:16.0+)OLAP 函数构建查询,并在以下情况下比较和消除 Teradata 中的汇总重复项 table。
我在 table ABC 中有以下 9 条记录。
Existing Data(Table - ABC)
ACCOUNT_ID EXT_REF_NO SERIAL_NUM RECORD_START_DT RECORD_END_DT
1 100000000002195 8495752450757852 341FE4E6A1AF 8/13/2019 12:24:42 8/20/2019 23:59:59
2 100000000002195 8495752450757852 342FE4E6A1AF 8/21/2019 08:49:08 8/25/2019 23:59:59
3 100000000002195 8495752450757852 343FE4E6A1AF 8/27/2019 02:42:46 8/26/2019 23:59:59
4 100000000002195 8495752450757852 344FE4E6A1AF 8/28/2019 06:33:50 8/28/2019 23:59:59
5 100000000002195 8495752450757852 345FE4E6A1AF 8/30/2019 02:35:32 8/31/2019 23:59:59
6 100000000002195 8495752450757852 346FE4E6A1AF 9/2/2019 00:25:05 9/1/2019 23:59:59
7 100000000002195 8495752450757852 347FE4E6A1AF 9/3/2019 03:33:28 9/3/2019 23:59:59
8 100000000002195 8495752450757852 348FE4E6A1AF 9/4/2019 18:35:45 9/8/2019 23:59:59
9 100000000002195 8495752450757852 349FE4E6A1AF 9/10/2019 11:22:54 3/16/2020 23:59:59
Output
ACCOUNT_ID EXT_REF_NO SERIAL_NUM RECORD_START_DT RECORD_END_DT
1 100000000002195 8495752450757852 341FE4E6A1AF 8/13/2019 12:24:42 8/26/2019 23:59:59
2 100000000002195 8495752450757852 342FE4E6A1AF 8/28/2019 06:33:50 8/28/2019 23:59:59
3 100000000002195 8495752450757852 343FE4E6A1AF 8/30/2019 02:35:32 9/1/2019 23:59:59
4 100000000002195 8495752450757852 345FE4E6A1AF 9/3/2019 03:33:28 9/8/2019 23:59:59
5 100000000002195 8495752450757852 346FE4E6A1AF 9/10/2019 11:22:54 3/16/2020 23:59:59
RECORD_END_DT 应该总是大于 RECORD_START_DT
如果差异大于 1,我们将只考虑汇总当前行的 Record_start_dt = 前一行的(RECORD_END_DT + 1 天)的记录不考虑的日子
您可以找到第 3 行和第 6 行的违反点 1,这实际上是在完成数据输入时同一天过期记录的错误,您实际上可以将 RECORD_START_DT 视为8/26/2019 00:00:00 & 9/2/2019 00:00:00 分别用于第 3 行和第 6 行计算
ACCOUNT_ID,EXT_REF_NO,SERIAL_NUM 所有 3 个都应考虑按
进行分区
我试过类似下面的方法。仅获得最小值 DEVICE_START_DATE 和最大值 DEVICE_END_DATE 的一行输出,如下所示:
ACCOUNT_ID EXT_REF SERIAL_NUM DEVICE_START_DATE DEVICE_END_DATE 100000000002195 8495752450757852 341FE4E6A1AF 8/13/2017 12:24:42.000000 9/16/2017 23:59:59.000000
Query: SELECT
ACCOUNT_ID,
EXT_REF,
SERIAL_NUM,
CASE WHEN (B.DIFF_DAYS <= 1 OR B.DIFF_DAYS IS NULL) THEN
min(DEVICE_START_DATE)
OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by
DEVICE_END_DATE desc)
WHEN (B.DIFF_DAYS > 1 ) THEN
min(DEVICE_START_DATE)
OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by
DEVICE_END_DATE desc)
END AS DEVICE_START_DATE,
DEVICE_END_DATE
FROM
(SELECT A.ACCOUNT_ID,
A.EXT_REF,
A.SERIAL_NUM,
A.DEVICE_START_DATE,
A.DEVICE_START_DATE_VIRTUAL,
A.DEVICE_END_DATE,
MIN(A.DEVICE_END_DATE)
OVER ( PARTITION BY A.ACCOUNT_ID,A.EXT_REF,A.SERIAL_NUM ORDER BY A.DEVICE_END_DATE
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS DEVICE_END_DATE_PREVIOUS_ROW,
TRUNC(A.DEVICE_START_DATE_VIRTUAL) - TRUNC(DEVICE_END_DATE_PREVIOUS_ROW) AS DIFF_DAYS
FROM
(SELECT
ACCOUNT_ID,
EXT_REF,
SERIAL_NUM,
DEVICE_START_DATE,
CASE WHEN DEVICE_START_DATE > DEVICE_END_DATE
THEN (DEVICE_START_DATE - INTERVAL '1' DAY)
ELSE DEVICE_START_DATE END AS DEVICE_START_DATE_VIRTUAL,
DEVICE_END_DATE
FROM NDW_XH_TEMP_TABLES.TEST) A) B
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY ACCOUNT_ID,EXT_REF,SERIAL_NUM order by DEVICE_END_DATE desc) = 1;
您需要嵌套的 OLAP 函数,这应该可以按预期工作:
SELECT
ACCOUNT_ID
,EXT_REF_NO
,SERIAL_NUM
,Coalesce(Lag(next_start)
Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
ORDER BY next_start NULLS LAST)
,min_start) AS RECORD_START_DT
-- If your Teradata version doesn't support LAG/LEAD you must switch to the MAX version
-- ,Coalesce(Max(next_start)
-- Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
-- ORDER BY next_start NULLS LAST
-- ROWS BETWEEN 1 Preceding AND 1 Preceding)
-- ,min_start) AS RECORD_START_DT
,RECORD_END_DT
FROM
(
SELECT
ACCOUNT_ID
,EXT_REF_NO
,SERIAL_NUM
,RECORD_START_DT
,RECORD_END_DT
-- to check for a gap
,Lag(fixed_start)
Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
ORDER BY fixed_start DESC) AS next_start
-- ,Max(fixed_start)
-- Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO
-- ORDER BY fixed_start DESC
-- ROWS BETWEEN 1 Preceding AND 1 Preceding) AS next_start
-- used in the outer COALESCE to get the min start for the 1st group
,Min(RECORD_START_DT)
Over (PARTITION BY ACCOUNT_ID, EXT_REF_NO) AS min_start
-- gap detection
,CASE WHEN Cast(RECORD_END_DT AS DATE) + 1 = Cast(next_start AS DATE) THEN 0 ELSE 1 END AS flag
FROM
( -- fixing the bad data first
SELECT t.*
,CASE WHEN RECORD_START_DT > RECORD_END_DT THEN RECORD_START_DT - INTERVAL '1' DAY ELSE RECORD_START_DT END AS fixed_start
FROM tab AS t
) AS fixed_data
QUALIFY flag = 1
) AS dt
这会寻找间隙,在应用 flag = 1 后,当前行获得最大结束日期,前一行获得匹配的开始日期。外层 Select 最终将此开始日期添加到当前行。