加入 table 到 Gap and Islands SQL Teradata

join table to Gap and Islands SQL Teradata

我将数据作为间隙和孤岛存储在数据库中,我需要将其与另一个 table 连接起来。 空隙和岛屿 table 如下所示:

|Subs_ID|ORIGINAL_STATUS|NEW_STATUS|CHANGE_DATE|
|-------+---------------+----------+-----------|
|123456 |1              |2         |12/2/2017  |
|123456 |2              |3         |12/8/2019  |
|123456 |3              |4         |12/18/2019 |
|123456 |4              |8         |12/28/2019 |
|123456 |8              |9         |10/4/2020  |

第二个table只包括Subs_ID和顺序Connect_date

|Subs_ID|CONNECT_DATE|
|-------+------------|
|123456 |12/1/2017   |
|123456 |12/3/2017   |
|123456 |11/4/2018   |
|123456 |10/5/2019   |
|123456 |12/30/2019  |
|123456 |10/4/2020   |
|123456 |5/21/2021   |

我需要使用 subs_iddates 将当前的 STATUS 从第一个 table 加入到第二个 STATUS。如果 Connect_date 小于 Change_Date 将采用第一个 ORIGINAL_STATUS 并且如果 Connect_date 大于 Change_Date 则结果将如下所示采取 last NEW_STATUS

|Subs_ID|CONNECT_DATE|Status|
|-------+------------+------|
|123456 |12/1/2017   |1     |
|123456 |12/3/2017   |2     |
|123456 |11/4/2018   |2     |
|123456 |10/5/2019   |2     |
|123456 |12/30/2019  |8     |
|123456 |10/4/2020   |8     |
|123456 |5/21/2021   |9     |

我通常会解决这种避免非等连接的问题:

SELECT
   Subs_ID
  ,dt AS CONNECT_DATE
 -- fill the NULLs
 -- if the Connect_date more than the Change_Date will take the last NEW_STATUS  
  ,Coalesce(Lag(NEW_STATUS IGNORE NULLS)
            Over (PARTITION BY Subs_ID
                  ORDER BY dt)
 -- if the Connect_date less than the Change_Date will take the first ORIGINAL_STATUS 
           ,Lead(ORIGINAL_STATUS IGNORE NULLS)
            Over (PARTITION BY Subs_ID
                  ORDER BY dt)
           )
FROM 
 ( -- combine both tables
   SELECT
      1 AS x -- flag indicating the source tables
     ,Subs_ID
     ,ORIGINAL_STATUS
     ,NEW_STATUS
     ,CHANGE_DATE AS dt
   FROM t1
   UNION ALL
   SELECT
      2 AS x
     ,Subs_ID
     ,NULL -- to get the same number of columns
     ,NULL -- to get the same number of columns
     ,CONNECT_DATE 
   FROM t2
 ) AS t
QUALIFY x = 2 -- return only rows from t2
ORDER BY CONNECT_DATE
;

要查看其工作原理,请评论 QUALIFY。

如果您的数据允许删除 LEAD 中的 IGNORE NULLS,则效率会更高(只有一个 STAT 步骤,而不是两个)。

希望此查询对您有所帮助。下面给出代码分解和解释

select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE

UNION

select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null

UNION

select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null

说明:第一个代码块 return 如果连接日期出现在可用更改日期范围内,则连接日期的状态。我正在使用滞后函数来获取以前的更改日期值并与连接日期进行比较。

select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE

这为我们提供了以下结果集

+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
|  123456 | 2017-12-03   |               2 |
|  123456 | 2018-11-04   |               2 |
|  123456 | 2019-10-05   |               2 |
|  123456 | 2019-12-30   |               8 |
|  123456 | 2020-10-04   |               8 |
+---------+--------------+-----------------+

现在我们要查找小于可用更改日期的 12/1/2017 的状态。 table 2 与 table 1 连接,后者具有最小更改日期行,如果连接日期小于更改日期,则采用 ORIGINAL_STATUS..

select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null

+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
|  123456 | 2017-12-01   |      1 |
+---------+--------------+--------+

剩下的是连接日期大于可用更改日期的记录。这是通过以下代码实现的,table 2 与具有最大更改日期行的 table 1 连接,如果连接日期大于更改日期,则采用新状态。

select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL 
END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() 
over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null

    +---------+--------------+--------+
    | Subs_ID | CONNECT_DATE | STATUS |
    +---------+--------------+--------+
    |  123456 | 2021-05-21   |      9 |
    +---------+--------------+--------+

最后在应用 union 之后我们得到了所需的结果集

+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
|  123456 | 2017-12-01   |               1 |
|  123456 | 2017-12-03   |               2 |
|  123456 | 2018-11-04   |               2 |
|  123456 | 2019-10-05   |               2 |
|  123456 | 2019-12-30   |               8 |
|  123456 | 2020-10-04   |               8 |
|  123456 | 2021-05-21   |               9 |
+---------+--------------+-----------------+