插入旧记录并用新记录替换旧记录
Insert into and replace old records with new
我有一个 table,它使用 sqoop 获取数据,每天都会被截断。
开头的这个 tblSqoop 具有这些值:
+----+-------+--------------+---------------+---------+--------+
| id | names | created_date | modified_date | country | number |
+----+-------+--------------+---------------+---------+--------+
| 33 | nick | 1/1/2020 | 1/1/2020 | Dubai | 1234 |
| 45 | ted | 2/7/2020 | 2/7/2020 | Spain | 12345 |
+----+-------+--------------+---------------+---------+--------+
并通过插入 tblMaxed 进行解析。
第二天 tblSqoop 有这个数据:
+----+-------+--------------+---------------+---------+--------+
| id | names | created_date | modified_date | country | number |
+----+-------+--------------+---------------+---------+--------+
| 33 | nick | 1/1/2020 | 12/31/2020 | Dubai | 1234 |
| 45 | ted | 2/7/2020 | 8/19/2020 | Spain | 12345 |
| 45 | ted | 2/7/2020 | 9/12/2020 | Spain | 12345 |
| 45 | ted | 2/7/2020 | 10/11/2020 | Spain | 12346 |
| 45 | ted | 2/7/2020 | 1/1/2021 | Spain | 12345 |
+----+-------+--------------+---------------+---------+--------+
我想要的是在 tblMaxed 中包含最新信息,例如:
+----+-------+--------------+---------------+---------+--------------------+
| id | names | created_date | modified_date | country | number |status_date|
+----+-------+--------------+---------------+---------+--------+-----------+
| 33 | nick | 1/1/2020 | 12/31/2020 | Dubai | 1234 |12/31/2020 |
| 45 | ted | 2/7/2020 | 10/11/2020 | Spain | 12346 |10/11/2020 |
| 45 | ted | 2/7/2020 | 1/1/2021 | Spain | 12345 |1/1/2021 |
+----+-------+--------------+---------------+---------+--------+-----------+
我是运行这个:
insert into tblMaxed
select
id,
names,
created_date,
modified_date,
country,
number,
MAX(modified_date) as status_date
from tblSqoop
group by id,
names,
created_date,
modified_date,
country,
number
结果我把所有的记录都拿了一遍。对PK的使用有帮助吗?
你能截断 table 并使用它重新加载 tblMaxed
吗? (解释在代码中)
select
id,
names,
created_date,
modified_date,
country,
number,
modified_date as status_date
FROM
(select t.*, row_number() OVER (PARTITION BY id,number Order by id,number , modified_date desc) rn from tblSqoop t) rs
where rs.rn=1 -- This will pick up data for MAX modified_date from sqoop table
我有一个 table,它使用 sqoop 获取数据,每天都会被截断。
开头的这个 tblSqoop 具有这些值:
+----+-------+--------------+---------------+---------+--------+
| id | names | created_date | modified_date | country | number |
+----+-------+--------------+---------------+---------+--------+
| 33 | nick | 1/1/2020 | 1/1/2020 | Dubai | 1234 |
| 45 | ted | 2/7/2020 | 2/7/2020 | Spain | 12345 |
+----+-------+--------------+---------------+---------+--------+
并通过插入 tblMaxed 进行解析。
第二天 tblSqoop 有这个数据:
+----+-------+--------------+---------------+---------+--------+
| id | names | created_date | modified_date | country | number |
+----+-------+--------------+---------------+---------+--------+
| 33 | nick | 1/1/2020 | 12/31/2020 | Dubai | 1234 |
| 45 | ted | 2/7/2020 | 8/19/2020 | Spain | 12345 |
| 45 | ted | 2/7/2020 | 9/12/2020 | Spain | 12345 |
| 45 | ted | 2/7/2020 | 10/11/2020 | Spain | 12346 |
| 45 | ted | 2/7/2020 | 1/1/2021 | Spain | 12345 |
+----+-------+--------------+---------------+---------+--------+
我想要的是在 tblMaxed 中包含最新信息,例如:
+----+-------+--------------+---------------+---------+--------------------+
| id | names | created_date | modified_date | country | number |status_date|
+----+-------+--------------+---------------+---------+--------+-----------+
| 33 | nick | 1/1/2020 | 12/31/2020 | Dubai | 1234 |12/31/2020 |
| 45 | ted | 2/7/2020 | 10/11/2020 | Spain | 12346 |10/11/2020 |
| 45 | ted | 2/7/2020 | 1/1/2021 | Spain | 12345 |1/1/2021 |
+----+-------+--------------+---------------+---------+--------+-----------+
我是运行这个:
insert into tblMaxed
select
id,
names,
created_date,
modified_date,
country,
number,
MAX(modified_date) as status_date
from tblSqoop
group by id,
names,
created_date,
modified_date,
country,
number
结果我把所有的记录都拿了一遍。对PK的使用有帮助吗?
你能截断 table 并使用它重新加载 tblMaxed
吗? (解释在代码中)
select
id,
names,
created_date,
modified_date,
country,
number,
modified_date as status_date
FROM
(select t.*, row_number() OVER (PARTITION BY id,number Order by id,number , modified_date desc) rn from tblSqoop t) rs
where rs.rn=1 -- This will pick up data for MAX modified_date from sqoop table