如何计算每列和 SQL 中的变化
How to count changes within each column and in SQL
这是 table 的样子:
id
city
address
steps
date
1
null
null
a
2021-11-01
1
NY
null
b
2021-11-04
1
Chicago
null
c
2021-11-05
2
SF
33, ABC colony
x
2021-12-01
2
SF
33, ABC colony
y
2021-12-04
2
SF
44, Kang Street
z
2021-12-05
3
Austin
null
i
2022-01-01
3
Austin
12, Bridgetown
j
2022-01-04
3
Austin
null
k
2022-01-05
我想要的是任何 'id' 仅在城市和地址字段中有更新但不包括 null 的总次数。我们不关心列步骤和那里的任何更新。
对于 id = 1,城市从 null 更改为 NY 到 Chicago。但是,地址仍然为空,但鉴于日期,我将其计为 2。从空更改为纽约不应算作更新。
对于 id = 2,城市从未改变,始终是 SF。但是,地址发生了变化,但只有一次,因此我们再次将更新计为 2。
对于 id = 3,城市从未更改,但地址从 null 更改为地址变回 null。我们不计算第一个空值,因为客户可能没有信息,但如果 he/she 将其更改回必须计算的空值。这里更新计数也将是 2.
我期待这样的结果:
id
change_count
1
2
2
2
3
2
我可以通过 sql 知道如何执行此操作吗?主要问题是不计算“null”,因为我按记录出现时间的升序对 id 进行排序,但在它变回“null”时计数是我主要感到困惑的地方。
感谢任何帮助。我正在努力,如果我完成 SQL,我也会在这里分享它。
我厌倦了使用 window-函数滞后和合并方法的组合,我终于得到了答案,但如果有人有更好的解决方案,请提出建议。 :)
我的sql:
with cte1 as(
select *,
row_number over(partition by id order by date) as rn
from main_table),
cte2 as (
select * from cte1 where (rn =1 and city <> null or address <> null)),
cte3 as (
SELECT id,
case when coalesce(city,'-1')=COALESCE(lag(city,1) over(partition by id order by date), city,'-1') then 0 else 1 end as cityChange,
case when coalesce(address,'-1')=COALESCE(lag(address,1) over(partition by id order by date), address,'-1') then 0 else 1 end as addressChange
from cte2)
select id,
sum(cityChange) as cityChangeCount,
sum(addressChange) as addressChangeCount
from cte3
group by id
这个对你有用吗?
WITH
-- your input, do not use in query ...
indata(id,city,addr,steps,dt) AS (
SELECT 1,NULL ,NULL ,'a',DATE '2021-11-01'
UNION ALL SELECT 1,'NY' ,NULL ,'b',DATE '2021-11-04'
UNION ALL SELECT 1,'Chicago',NULL ,'c',DATE '2021-11-05'
UNION ALL SELECT 2,'SF' ,'33, ABC colony' ,'x',DATE '2021-12-01'
UNION ALL SELECT 2,'SF' ,'33, ABC colony' ,'y',DATE '2021-12-04'
UNION ALL SELECT 2,'SF' ,'44, Kang Street','z',DATE '2021-12-05'
UNION ALL SELECT 3,'Austin' ,NULL ,'i',DATE '2022-01-01'
UNION ALL SELECT 3,'Austin' ,'12, Bridgetown' ,'j',DATE '2022-01-04'
UNION ALL SELECT 3,'Austin' ,NULL ,'k',DATE '2022-01-05'
)
-- end of your input
-- real query starts here, replace following comma with "WITH" ...
,
olap AS (
SELECT
id
-- a NULL is not COUNTed DISTINCT, but an empty string is
, CASE WHEN city IS NULL AND LAG(city) OVER w IS NOT NULL THEN '' ELSE city END AS city
, CASE WHEN addr IS NULL AND LAG(addr) OVER w IS NOT NULL THEN '' ELSE addr END AS addr
FROM indata
WINDOW w AS (PARTITION BY id ORDER BY dt)
)
SELECT
id
, GREATEST(COUNT(DISTINCT city),COUNT(DISTINCT addr)) AS changecount
FROM olap
GROUP BY 1
ORDER BY 1
;
-- out id | changecount
-- out ----+-------------
-- out 1 | 2
-- out 2 | 2
-- out 3 | 2
这是 table 的样子:
id | city | address | steps | date |
---|---|---|---|---|
1 | null | null | a | 2021-11-01 |
1 | NY | null | b | 2021-11-04 |
1 | Chicago | null | c | 2021-11-05 |
2 | SF | 33, ABC colony | x | 2021-12-01 |
2 | SF | 33, ABC colony | y | 2021-12-04 |
2 | SF | 44, Kang Street | z | 2021-12-05 |
3 | Austin | null | i | 2022-01-01 |
3 | Austin | 12, Bridgetown | j | 2022-01-04 |
3 | Austin | null | k | 2022-01-05 |
我想要的是任何 'id' 仅在城市和地址字段中有更新但不包括 null 的总次数。我们不关心列步骤和那里的任何更新。
对于 id = 1,城市从 null 更改为 NY 到 Chicago。但是,地址仍然为空,但鉴于日期,我将其计为 2。从空更改为纽约不应算作更新。
对于 id = 2,城市从未改变,始终是 SF。但是,地址发生了变化,但只有一次,因此我们再次将更新计为 2。
对于 id = 3,城市从未更改,但地址从 null 更改为地址变回 null。我们不计算第一个空值,因为客户可能没有信息,但如果 he/she 将其更改回必须计算的空值。这里更新计数也将是 2.
我期待这样的结果:
id | change_count |
---|---|
1 | 2 |
2 | 2 |
3 | 2 |
我可以通过 sql 知道如何执行此操作吗?主要问题是不计算“null”,因为我按记录出现时间的升序对 id 进行排序,但在它变回“null”时计数是我主要感到困惑的地方。
感谢任何帮助。我正在努力,如果我完成 SQL,我也会在这里分享它。
我厌倦了使用 window-函数滞后和合并方法的组合,我终于得到了答案,但如果有人有更好的解决方案,请提出建议。 :)
我的sql:
with cte1 as(
select *,
row_number over(partition by id order by date) as rn
from main_table),
cte2 as (
select * from cte1 where (rn =1 and city <> null or address <> null)),
cte3 as (
SELECT id,
case when coalesce(city,'-1')=COALESCE(lag(city,1) over(partition by id order by date), city,'-1') then 0 else 1 end as cityChange,
case when coalesce(address,'-1')=COALESCE(lag(address,1) over(partition by id order by date), address,'-1') then 0 else 1 end as addressChange
from cte2)
select id,
sum(cityChange) as cityChangeCount,
sum(addressChange) as addressChangeCount
from cte3
group by id
这个对你有用吗?
WITH
-- your input, do not use in query ...
indata(id,city,addr,steps,dt) AS (
SELECT 1,NULL ,NULL ,'a',DATE '2021-11-01'
UNION ALL SELECT 1,'NY' ,NULL ,'b',DATE '2021-11-04'
UNION ALL SELECT 1,'Chicago',NULL ,'c',DATE '2021-11-05'
UNION ALL SELECT 2,'SF' ,'33, ABC colony' ,'x',DATE '2021-12-01'
UNION ALL SELECT 2,'SF' ,'33, ABC colony' ,'y',DATE '2021-12-04'
UNION ALL SELECT 2,'SF' ,'44, Kang Street','z',DATE '2021-12-05'
UNION ALL SELECT 3,'Austin' ,NULL ,'i',DATE '2022-01-01'
UNION ALL SELECT 3,'Austin' ,'12, Bridgetown' ,'j',DATE '2022-01-04'
UNION ALL SELECT 3,'Austin' ,NULL ,'k',DATE '2022-01-05'
)
-- end of your input
-- real query starts here, replace following comma with "WITH" ...
,
olap AS (
SELECT
id
-- a NULL is not COUNTed DISTINCT, but an empty string is
, CASE WHEN city IS NULL AND LAG(city) OVER w IS NOT NULL THEN '' ELSE city END AS city
, CASE WHEN addr IS NULL AND LAG(addr) OVER w IS NOT NULL THEN '' ELSE addr END AS addr
FROM indata
WINDOW w AS (PARTITION BY id ORDER BY dt)
)
SELECT
id
, GREATEST(COUNT(DISTINCT city),COUNT(DISTINCT addr)) AS changecount
FROM olap
GROUP BY 1
ORDER BY 1
;
-- out id | changecount
-- out ----+-------------
-- out 1 | 2
-- out 2 | 2
-- out 3 | 2