替换每个分区的 NULL 值

Replace NULL values per partition

我想为每个 session_id 列的 NULL 值填充一个关联的非空值。我怎样才能做到这一点?

示例数据如下:

+------------+-------+---------+
| session_id | step  | device  |
+------------+-------+---------+
| 351acc     | step1 |         |
| 351acc     | step2 |         |
| 351acc     | step3 | mobile  |
| 351acc     | step4 | mobile  |
| 350bca     | step1 | desktop |
| 350bca     | step2 |         |
| 350bca     | step3 |         |
| 350bca     | step4 | desktop |
+------------+-------+---------+

期望输出:

+------------+-------+---------+
| session_id | step  | device  |
+------------+-------+---------+
| 351acc     | step1 | mobile  |
| 351acc     | step2 | mobile  |
| 351acc     | step3 | mobile  |
| 351acc     | step4 | mobile  |
| 350bca     | step1 | desktop |
| 350bca     | step2 | desktop |
| 350bca     | step3 | desktop |
| 350bca     | step4 | desktop |
+------------+-------+---------+
select session_id, step,coalesce(device, max(device) over (partition by session_id order by step desc)) device
from table 

根据您的数据样本,每个会话有一个设备,因此您可以添加一个子查询以从其他行获取值

WITH j (session_id, step, device) AS (
  VALUES ('351acc','step1',NULL),
         ('351acc','step2',NULL),
         ('351acc','step3','mobile'),
         ('351acc','step4','mobile'),
         ('350bca','step1','desktop'),
         ('350bca','step2',NULL),
         ('350bca','step3',NULL),
         ('350bca','step4','desktop')
) 
SELECT session_id,step,
  (SELECT DISTINCT device 
   FROM j q2
   WHERE q2.session_id = q1.session_id AND q2.device IS NOT NULL) AS device
FROM j q1 ORDER BY session_id,step;

 session_id | step  | device  
------------+-------+---------
 350bca     | step1 | desktop
 350bca     | step2 | desktop
 350bca     | step3 | desktop
 350bca     | step4 | desktop
 351acc     | step1 | mobile
 351acc     | step2 | mobile
 351acc     | step3 | mobile
 351acc     | step4 | mobile
(8 Zeilen)

演示:db<>fiddle

顺序正确的window function first_value()可能是最便宜的:

SELECT session_id, step
     , COALESCE(device
              , first_value(device) OVER (PARTITION BY session_id ORDER BY device IS NULL, step)
               ) AS device
FROM   tbl
ORDER  BY session_id DESC, step;

db<>fiddle here

ORDER BY device IS NULL, step 最后对 NULL 值进行排序,因此选择最早的具有非空值的 step。参见:

  • Sorting null values after all others, except special

如果每个 session_id 的非空设备始终相同,您可以简化为 ORDER BY device IS NULL。而且你不需要 COALESCE.