如何计算每列和 SQL 中的变化

How to count changes within each column and in SQL

这是 table 的样子:

id city address steps date
1 null null a 2021-11-01
1 NY null b 2021-11-04
1 Chicago null c 2021-11-05
2 SF 33, ABC colony x 2021-12-01
2 SF 33, ABC colony y 2021-12-04
2 SF 44, Kang Street z 2021-12-05
3 Austin null i 2022-01-01
3 Austin 12, Bridgetown j 2022-01-04
3 Austin null k 2022-01-05

我想要的是任何 'id' 仅在城市和地址字段中有更新但不包括 null 的总次数。我们不关心列步骤和那里的任何更新。

对于 id = 1,城市从 null 更改为 NY 到 Chicago。但是,地址仍然为空,但鉴于日期,我将其计为 2。从空更改为纽约不应算作更新。

对于 id = 2,城市从未改变,始终是 SF。但是,地址发生了变化,但只有一次,因此我们再次将更新计为 2。

对于 id = 3,城市从未更改,但地址从 null 更改为地址变回 null。我们不计算第一个空值,因为客户可能没有信息,但如果 he/she 将其更改回必须计算的空值。这里更新计数也将是 2.

我期待这样的结果:

id change_count
1 2
2 2
3 2

我可以通过 sql 知道如何执行此操作吗?主要问题是不计算“null”,因为我按记录出现时间的升序对 id 进行排序,但在它变回“null”时计数是我主要感到困惑的地方。

感谢任何帮助。我正在努力,如果我完成 SQL,我也会在这里分享它。

我厌倦了使用 window-函数滞后和合并方法的组合,我终于得到了答案,但如果有人有更好的解决方案,请提出建议。 :)

我的sql:

with cte1 as(
select *,
row_number over(partition by id order by date) as rn
from main_table),
cte2 as (
select * from cte1 where (rn =1 and city <> null or address <> null)),
cte3 as (
SELECT id,
case when coalesce(city,'-1')=COALESCE(lag(city,1) over(partition by id order by date), city,'-1') then 0 else 1 end as cityChange,
case when coalesce(address,'-1')=COALESCE(lag(address,1) over(partition by id order by date), address,'-1') then 0 else 1 end as addressChange
from cte2)
select id,
sum(cityChange) as cityChangeCount,
sum(addressChange) as addressChangeCount
from cte3
group by id

这个对你有用吗?

WITH
-- your input, do not use in query ...
indata(id,city,addr,steps,dt) AS (
          SELECT 1,NULL     ,NULL             ,'a',DATE '2021-11-01'
UNION ALL SELECT 1,'NY'     ,NULL             ,'b',DATE '2021-11-04'
UNION ALL SELECT 1,'Chicago',NULL             ,'c',DATE '2021-11-05'
UNION ALL SELECT 2,'SF'     ,'33, ABC colony' ,'x',DATE '2021-12-01'
UNION ALL SELECT 2,'SF'     ,'33, ABC colony' ,'y',DATE '2021-12-04'
UNION ALL SELECT 2,'SF'     ,'44, Kang Street','z',DATE '2021-12-05'
UNION ALL SELECT 3,'Austin' ,NULL             ,'i',DATE '2022-01-01'
UNION ALL SELECT 3,'Austin' ,'12, Bridgetown' ,'j',DATE '2022-01-04'
UNION ALL SELECT 3,'Austin' ,NULL             ,'k',DATE '2022-01-05'
)
-- end of your input
-- real query starts here, replace following comma with "WITH" ...
,
olap AS (
  SELECT
    id
  -- a NULL is not COUNTed DISTINCT, but an empty string is
  , CASE WHEN city IS NULL AND LAG(city) OVER w IS NOT NULL THEN '' ELSE city END AS city
  , CASE WHEN addr IS NULL AND LAG(addr) OVER w IS NOT NULL THEN '' ELSE addr END AS addr
  FROM indata
  WINDOW w AS (PARTITION BY id ORDER BY dt)
)
SELECT
  id
, GREATEST(COUNT(DISTINCT city),COUNT(DISTINCT addr)) AS changecount
FROM olap
GROUP BY 1
ORDER BY 1
;
-- out  id | changecount 
-- out ----+-------------
-- out   1 |           2
-- out   2 |           2
-- out   3 |           2