基于分区对数据进行分类

Categorizing Data based on Partition

我有一些数据如下所示:

fac_id     unit_id     created_at           rentable     r_id     row_num
1          1           2021-01-04 13:22:00  TRUE         1        1
1          1           2021-01-05 13:22:00  TRUE         2        2
1          1           2021-01-06 13:22:00  FALSE        1        3
1          1           2021-01-07 13:22:00  FALSE        2        4
1          1           2021-01-08 13:22:00  TRUE         1        5
1          1           2021-01-09 13:22:00  TRUE         2        6
1          1           2021-01-10 13:22:00  TRUE         3        7

我想做的是按 fac_idunit_idrentable 对数据进行分区。每次行从 rentable = TRUE 切换到 rentable = FALSE 时,对于相同的 fac_idunit_id 值,我希望 classify_num 值递增 1,以便数据看起来像这样。

期望输出:

fac_id     unit_id     created_at           rentable     r_id     row_num   classify_num
1          1           2021-01-04 13:22:00  TRUE         1        1         1            
1          1           2021-01-05 13:22:00  TRUE         2        2         1
1          1           2021-01-06 13:22:00  FALSE        1        3         2
1          1           2021-01-07 13:22:00  FALSE        2        4         2
1          1           2021-01-08 13:22:00  TRUE         1        5         3
1          1           2021-01-09 13:22:00  TRUE         2        6         3
1          1           2021-01-10 13:22:00  TRUE         3        7         3

可以使用LAG()SUM()window等函数

WITH t2 AS
(
 SELECT *,
        LAG(rentable) OVER (PARTITION BY fac_id, unit_id ORDER BY created_at) AS lg
   FROM t -- your original table
)   
SELECT *,
       SUM(CASE WHEN rentable=lg THEN 0 ELSE 1 END) OVER (ORDER BY created_at) AS classify_num
  FROM t2
 ORDER BY created_at 

Demo

您可以使用 Window Functions to reference a value expression from another rows field. However window functions cannot be nested. So you can use CTE。这是使用的 sumlag 函数。

WITH my_values AS (
  SELECT
      fac_id,
      unit_id,
      created_at,
      rentable,
      r_id,
      row_num,
      --returns 1 if there is a change 
      CASE WHEN (lag(rentable, 1, rentable) OVER (ORDER BY r_id)) = rentable 
        THEN 0
        ELSE 1
      END AS change
    FROM my_table
)
SELECT
    fac_id,
    unit_id,
    created_at,
    rentable,
    r_id,
    row_num,
    -- sums change values for each row till current row.
    sum(change) OVER (ORDER BY r_id) classify_num
  FROM my_values

db<>fiddle here