加入 table 到 Gap and Islands SQL Teradata
join table to Gap and Islands SQL Teradata
我将数据作为间隙和孤岛存储在数据库中,我需要将其与另一个 table 连接起来。
空隙和岛屿 table 如下所示:
|Subs_ID|ORIGINAL_STATUS|NEW_STATUS|CHANGE_DATE|
|-------+---------------+----------+-----------|
|123456 |1 |2 |12/2/2017 |
|123456 |2 |3 |12/8/2019 |
|123456 |3 |4 |12/18/2019 |
|123456 |4 |8 |12/28/2019 |
|123456 |8 |9 |10/4/2020 |
第二个table只包括Subs_ID和顺序Connect_date
|Subs_ID|CONNECT_DATE|
|-------+------------|
|123456 |12/1/2017 |
|123456 |12/3/2017 |
|123456 |11/4/2018 |
|123456 |10/5/2019 |
|123456 |12/30/2019 |
|123456 |10/4/2020 |
|123456 |5/21/2021 |
我需要使用 subs_id
和 dates
将当前的 STATUS
从第一个 table 加入到第二个 STATUS
。如果 Connect_date
小于 Change_Date
将采用第一个 ORIGINAL_STATUS
并且如果 Connect_date
大于 Change_Date
则结果将如下所示采取 last NEW_STATUS
|Subs_ID|CONNECT_DATE|Status|
|-------+------------+------|
|123456 |12/1/2017 |1 |
|123456 |12/3/2017 |2 |
|123456 |11/4/2018 |2 |
|123456 |10/5/2019 |2 |
|123456 |12/30/2019 |8 |
|123456 |10/4/2020 |8 |
|123456 |5/21/2021 |9 |
我通常会解决这种避免非等连接的问题:
SELECT
Subs_ID
,dt AS CONNECT_DATE
-- fill the NULLs
-- if the Connect_date more than the Change_Date will take the last NEW_STATUS
,Coalesce(Lag(NEW_STATUS IGNORE NULLS)
Over (PARTITION BY Subs_ID
ORDER BY dt)
-- if the Connect_date less than the Change_Date will take the first ORIGINAL_STATUS
,Lead(ORIGINAL_STATUS IGNORE NULLS)
Over (PARTITION BY Subs_ID
ORDER BY dt)
)
FROM
( -- combine both tables
SELECT
1 AS x -- flag indicating the source tables
,Subs_ID
,ORIGINAL_STATUS
,NEW_STATUS
,CHANGE_DATE AS dt
FROM t1
UNION ALL
SELECT
2 AS x
,Subs_ID
,NULL -- to get the same number of columns
,NULL -- to get the same number of columns
,CONNECT_DATE
FROM t2
) AS t
QUALIFY x = 2 -- return only rows from t2
ORDER BY CONNECT_DATE
;
要查看其工作原理,请评论 QUALIFY。
如果您的数据允许删除 LEAD 中的 IGNORE NULLS,则效率会更高(只有一个 STAT 步骤,而不是两个)。
希望此查询对您有所帮助。下面给出代码分解和解释
select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE
UNION
select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null
UNION
select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null
说明:第一个代码块 return 如果连接日期出现在可用更改日期范围内,则连接日期的状态。我正在使用滞后函数来获取以前的更改日期值并与连接日期进行比较。
select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE
这为我们提供了以下结果集
+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
| 123456 | 2017-12-03 | 2 |
| 123456 | 2018-11-04 | 2 |
| 123456 | 2019-10-05 | 2 |
| 123456 | 2019-12-30 | 8 |
| 123456 | 2020-10-04 | 8 |
+---------+--------------+-----------------+
现在我们要查找小于可用更改日期的 12/1/2017 的状态。 table 2 与 table 1 连接,后者具有最小更改日期行,如果连接日期小于更改日期,则采用 ORIGINAL_STATUS..
select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null
+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
| 123456 | 2017-12-01 | 1 |
+---------+--------------+--------+
剩下的是连接日期大于可用更改日期的记录。这是通过以下代码实现的,table 2 与具有最大更改日期行的 table 1 连接,如果连接日期大于更改日期,则采用新状态。
select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL
END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number()
over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null
+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
| 123456 | 2021-05-21 | 9 |
+---------+--------------+--------+
最后在应用 union 之后我们得到了所需的结果集
+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
| 123456 | 2017-12-01 | 1 |
| 123456 | 2017-12-03 | 2 |
| 123456 | 2018-11-04 | 2 |
| 123456 | 2019-10-05 | 2 |
| 123456 | 2019-12-30 | 8 |
| 123456 | 2020-10-04 | 8 |
| 123456 | 2021-05-21 | 9 |
+---------+--------------+-----------------+
我将数据作为间隙和孤岛存储在数据库中,我需要将其与另一个 table 连接起来。 空隙和岛屿 table 如下所示:
|Subs_ID|ORIGINAL_STATUS|NEW_STATUS|CHANGE_DATE|
|-------+---------------+----------+-----------|
|123456 |1 |2 |12/2/2017 |
|123456 |2 |3 |12/8/2019 |
|123456 |3 |4 |12/18/2019 |
|123456 |4 |8 |12/28/2019 |
|123456 |8 |9 |10/4/2020 |
第二个table只包括Subs_ID和顺序Connect_date
|Subs_ID|CONNECT_DATE|
|-------+------------|
|123456 |12/1/2017 |
|123456 |12/3/2017 |
|123456 |11/4/2018 |
|123456 |10/5/2019 |
|123456 |12/30/2019 |
|123456 |10/4/2020 |
|123456 |5/21/2021 |
我需要使用 subs_id
和 dates
将当前的 STATUS
从第一个 table 加入到第二个 STATUS
。如果 Connect_date
小于 Change_Date
将采用第一个 ORIGINAL_STATUS
并且如果 Connect_date
大于 Change_Date
则结果将如下所示采取 last NEW_STATUS
|Subs_ID|CONNECT_DATE|Status|
|-------+------------+------|
|123456 |12/1/2017 |1 |
|123456 |12/3/2017 |2 |
|123456 |11/4/2018 |2 |
|123456 |10/5/2019 |2 |
|123456 |12/30/2019 |8 |
|123456 |10/4/2020 |8 |
|123456 |5/21/2021 |9 |
我通常会解决这种避免非等连接的问题:
SELECT
Subs_ID
,dt AS CONNECT_DATE
-- fill the NULLs
-- if the Connect_date more than the Change_Date will take the last NEW_STATUS
,Coalesce(Lag(NEW_STATUS IGNORE NULLS)
Over (PARTITION BY Subs_ID
ORDER BY dt)
-- if the Connect_date less than the Change_Date will take the first ORIGINAL_STATUS
,Lead(ORIGINAL_STATUS IGNORE NULLS)
Over (PARTITION BY Subs_ID
ORDER BY dt)
)
FROM
( -- combine both tables
SELECT
1 AS x -- flag indicating the source tables
,Subs_ID
,ORIGINAL_STATUS
,NEW_STATUS
,CHANGE_DATE AS dt
FROM t1
UNION ALL
SELECT
2 AS x
,Subs_ID
,NULL -- to get the same number of columns
,NULL -- to get the same number of columns
,CONNECT_DATE
FROM t2
) AS t
QUALIFY x = 2 -- return only rows from t2
ORDER BY CONNECT_DATE
;
要查看其工作原理,请评论 QUALIFY。
如果您的数据允许删除 LEAD 中的 IGNORE NULLS,则效率会更高(只有一个 STAT 步骤,而不是两个)。
希望此查询对您有所帮助。下面给出代码分解和解释
select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE
UNION
select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null
UNION
select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null
说明:第一个代码块 return 如果连接日期出现在可用更改日期范围内,则连接日期的状态。我正在使用滞后函数来获取以前的更改日期值并与连接日期进行比较。
select a.*,B.ORIGINAL_STATUS
from mytable2 a
join (select *,LAG(CHANGE_DATE) OVER(ORDER BY CHANGE_DATE) as previous_date_value
from mytable)b
on a.Subs_ID =b.Subs_ID and a.CONNECT_DATE >b.previous_date_value and a.CONNECT_DATE<= b.CHANGE_DATE
这为我们提供了以下结果集
+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
| 123456 | 2017-12-03 | 2 |
| 123456 | 2018-11-04 | 2 |
| 123456 | 2019-10-05 | 2 |
| 123456 | 2019-12-30 | 8 |
| 123456 | 2020-10-04 | 8 |
+---------+--------------+-----------------+
现在我们要查找小于可用更改日期的 12/1/2017 的状态。 table 2 与 table 1 连接,后者具有最小更改日期行,如果连接日期小于更改日期,则采用 ORIGINAL_STATUS..
select * from (
select a.*,CASE WHEN A.CONNECT_dATE<B.CHANGE_DATE THEN b.ORIGINAL_STATUS ELSE NULL END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number() over(partition by Subs_ID order by CHANGE_DATE)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)less_than_available_date
where status is not null
+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
| 123456 | 2017-12-01 | 1 |
+---------+--------------+--------+
剩下的是连接日期大于可用更改日期的记录。这是通过以下代码实现的,table 2 与具有最大更改日期行的 table 1 连接,如果连接日期大于更改日期,则采用新状态。
select * from (
select a.*,CASE WHEN A.CONNECT_dATE>B.CHANGE_DATE THEN b.NEW_STATUS ELSE NULL
END AS STATUS
from mytable2 a
join (
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE from
(
select Subs_ID,ORIGINAL_STATUS,NEW_STATUS,CHANGE_DATE,row_number()
over(partition by Subs_ID order by CHANGE_DATE desc)rn
from mytable
)mytable_less
where rn=1)b
on a.Subs_ID=b.Subs_ID)greater_than_available_date
where status is not null
+---------+--------------+--------+
| Subs_ID | CONNECT_DATE | STATUS |
+---------+--------------+--------+
| 123456 | 2021-05-21 | 9 |
+---------+--------------+--------+
最后在应用 union 之后我们得到了所需的结果集
+---------+--------------+-----------------+
| Subs_ID | CONNECT_DATE | ORIGINAL_STATUS |
+---------+--------------+-----------------+
| 123456 | 2017-12-01 | 1 |
| 123456 | 2017-12-03 | 2 |
| 123456 | 2018-11-04 | 2 |
| 123456 | 2019-10-05 | 2 |
| 123456 | 2019-12-30 | 8 |
| 123456 | 2020-10-04 | 8 |
| 123456 | 2021-05-21 | 9 |
+---------+--------------+-----------------+