基于分区对数据进行分类
Categorizing Data based on Partition
我有一些数据如下所示:
fac_id unit_id created_at rentable r_id row_num
1 1 2021-01-04 13:22:00 TRUE 1 1
1 1 2021-01-05 13:22:00 TRUE 2 2
1 1 2021-01-06 13:22:00 FALSE 1 3
1 1 2021-01-07 13:22:00 FALSE 2 4
1 1 2021-01-08 13:22:00 TRUE 1 5
1 1 2021-01-09 13:22:00 TRUE 2 6
1 1 2021-01-10 13:22:00 TRUE 3 7
我想做的是按 fac_id
、unit_id
和 rentable
对数据进行分区。每次行从 rentable = TRUE
切换到 rentable = FALSE
时,对于相同的 fac_id
和 unit_id
值,我希望 classify_num
值递增 1,以便数据看起来像这样。
期望输出:
fac_id unit_id created_at rentable r_id row_num classify_num
1 1 2021-01-04 13:22:00 TRUE 1 1 1
1 1 2021-01-05 13:22:00 TRUE 2 2 1
1 1 2021-01-06 13:22:00 FALSE 1 3 2
1 1 2021-01-07 13:22:00 FALSE 2 4 2
1 1 2021-01-08 13:22:00 TRUE 1 5 3
1 1 2021-01-09 13:22:00 TRUE 2 6 3
1 1 2021-01-10 13:22:00 TRUE 3 7 3
可以使用LAG()
和SUM()
window等函数
WITH t2 AS
(
SELECT *,
LAG(rentable) OVER (PARTITION BY fac_id, unit_id ORDER BY created_at) AS lg
FROM t -- your original table
)
SELECT *,
SUM(CASE WHEN rentable=lg THEN 0 ELSE 1 END) OVER (ORDER BY created_at) AS classify_num
FROM t2
ORDER BY created_at
您可以使用 Window Functions to reference a value expression from another rows field. However window functions cannot be nested. So you can use CTE。这是使用的 sum
和 lag
函数。
WITH my_values AS (
SELECT
fac_id,
unit_id,
created_at,
rentable,
r_id,
row_num,
--returns 1 if there is a change
CASE WHEN (lag(rentable, 1, rentable) OVER (ORDER BY r_id)) = rentable
THEN 0
ELSE 1
END AS change
FROM my_table
)
SELECT
fac_id,
unit_id,
created_at,
rentable,
r_id,
row_num,
-- sums change values for each row till current row.
sum(change) OVER (ORDER BY r_id) classify_num
FROM my_values
db<>fiddle here
我有一些数据如下所示:
fac_id unit_id created_at rentable r_id row_num
1 1 2021-01-04 13:22:00 TRUE 1 1
1 1 2021-01-05 13:22:00 TRUE 2 2
1 1 2021-01-06 13:22:00 FALSE 1 3
1 1 2021-01-07 13:22:00 FALSE 2 4
1 1 2021-01-08 13:22:00 TRUE 1 5
1 1 2021-01-09 13:22:00 TRUE 2 6
1 1 2021-01-10 13:22:00 TRUE 3 7
我想做的是按 fac_id
、unit_id
和 rentable
对数据进行分区。每次行从 rentable = TRUE
切换到 rentable = FALSE
时,对于相同的 fac_id
和 unit_id
值,我希望 classify_num
值递增 1,以便数据看起来像这样。
期望输出:
fac_id unit_id created_at rentable r_id row_num classify_num
1 1 2021-01-04 13:22:00 TRUE 1 1 1
1 1 2021-01-05 13:22:00 TRUE 2 2 1
1 1 2021-01-06 13:22:00 FALSE 1 3 2
1 1 2021-01-07 13:22:00 FALSE 2 4 2
1 1 2021-01-08 13:22:00 TRUE 1 5 3
1 1 2021-01-09 13:22:00 TRUE 2 6 3
1 1 2021-01-10 13:22:00 TRUE 3 7 3
可以使用LAG()
和SUM()
window等函数
WITH t2 AS
(
SELECT *,
LAG(rentable) OVER (PARTITION BY fac_id, unit_id ORDER BY created_at) AS lg
FROM t -- your original table
)
SELECT *,
SUM(CASE WHEN rentable=lg THEN 0 ELSE 1 END) OVER (ORDER BY created_at) AS classify_num
FROM t2
ORDER BY created_at
您可以使用 Window Functions to reference a value expression from another rows field. However window functions cannot be nested. So you can use CTE。这是使用的 sum
和 lag
函数。
WITH my_values AS (
SELECT
fac_id,
unit_id,
created_at,
rentable,
r_id,
row_num,
--returns 1 if there is a change
CASE WHEN (lag(rentable, 1, rentable) OVER (ORDER BY r_id)) = rentable
THEN 0
ELSE 1
END AS change
FROM my_table
)
SELECT
fac_id,
unit_id,
created_at,
rentable,
r_id,
row_num,
-- sums change values for each row till current row.
sum(change) OVER (ORDER BY r_id) classify_num
FROM my_values
db<>fiddle here