SQL 在旋转组模式中产生密集排名
SQL result dense Rank in revolving group pattern
说我有一个table喜欢
store
date
is_open
Bay
1/1/2022
true
Bay
1/2/2022
true
Bay
1/3/2022
true
Bay
1/4/2022
false
Bay
1/5/2022
false
Bay
1/6/2022
false
Bay
1/7/2022
true
Bay
1/8/2022
true
Bay
1/9/2022
true
Walmart
1/7/2022
true
Walmart
1/8/2022
false
Walmart
1/9/2022
true
我希望他们使用分区依据并获得组的排名,例如
store
date
is_open
group
Bay
1/1/2022
true
1
Bay
1/2/2022
true
1
Bay
1/3/2022
true
1
Bay
1/4/2022
false
2
Bay
1/5/2022
false
2
Bay
1/6/2022
false
2
Bay
1/7/2022
true
3
Bay
1/8/2022
true
3
Bay
1/9/2022
true
3
Walmart
1/7/2022
true
1
Walmart
1/8/2022
false
2
Walmart
1/9/2022
true
3
我开始尝试按 store
和 is_open
进行分区,但真的很困惑按子句顺序使用什么,我们将不胜感激。
这实际上是一个缺口和孤岛问题。一种方法使用行号差异方法:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY store ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY store, is_open ORDER BY date) rn2
FROM yourTable t
),
cte2 AS (
SELECT t.*, MIN(date) OVER (PARTITION BY store, is_open, rn1 - rn2) AS min_date
FROM cte t
)
SELECT store, date, is_open,
DENSE_RANK() OVER (PARTITION BY store ORDER BY rn1 - rn2, min_date) "group"
FROM cte2
ORDER BY store, date;
请注意,我们在这里使用第二个 CTE cte2
来查找每个岛屿的最小日期值。这样做是为了将两个岛与不同的 is_open
值 (true/false) 区分开来,这两个岛恰好在行号上具有相同的差异。它确保在行号差异相同的情况下,首先报告较早的岛。
您可以使用 LAG() 来检测组的开始。
with cte AS (
SELECT t.*, case when lag(is_open) OVER (PARTITION BY store ORDER BY date) = is_open then 0 else 1 end sflag
FROM yourTable t
)
SELECT store, date, is_open, sum(sflag) over(PARTITION BY store ORDER BY date) grp
FROM cte
ORDER BY store, date;
说我有一个table喜欢
store | date | is_open |
---|---|---|
Bay | 1/1/2022 | true |
Bay | 1/2/2022 | true |
Bay | 1/3/2022 | true |
Bay | 1/4/2022 | false |
Bay | 1/5/2022 | false |
Bay | 1/6/2022 | false |
Bay | 1/7/2022 | true |
Bay | 1/8/2022 | true |
Bay | 1/9/2022 | true |
Walmart | 1/7/2022 | true |
Walmart | 1/8/2022 | false |
Walmart | 1/9/2022 | true |
我希望他们使用分区依据并获得组的排名,例如
store | date | is_open | group |
---|---|---|---|
Bay | 1/1/2022 | true | 1 |
Bay | 1/2/2022 | true | 1 |
Bay | 1/3/2022 | true | 1 |
Bay | 1/4/2022 | false | 2 |
Bay | 1/5/2022 | false | 2 |
Bay | 1/6/2022 | false | 2 |
Bay | 1/7/2022 | true | 3 |
Bay | 1/8/2022 | true | 3 |
Bay | 1/9/2022 | true | 3 |
Walmart | 1/7/2022 | true | 1 |
Walmart | 1/8/2022 | false | 2 |
Walmart | 1/9/2022 | true | 3 |
我开始尝试按 store
和 is_open
进行分区,但真的很困惑按子句顺序使用什么,我们将不胜感激。
这实际上是一个缺口和孤岛问题。一种方法使用行号差异方法:
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY store ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY store, is_open ORDER BY date) rn2
FROM yourTable t
),
cte2 AS (
SELECT t.*, MIN(date) OVER (PARTITION BY store, is_open, rn1 - rn2) AS min_date
FROM cte t
)
SELECT store, date, is_open,
DENSE_RANK() OVER (PARTITION BY store ORDER BY rn1 - rn2, min_date) "group"
FROM cte2
ORDER BY store, date;
请注意,我们在这里使用第二个 CTE cte2
来查找每个岛屿的最小日期值。这样做是为了将两个岛与不同的 is_open
值 (true/false) 区分开来,这两个岛恰好在行号上具有相同的差异。它确保在行号差异相同的情况下,首先报告较早的岛。
您可以使用 LAG() 来检测组的开始。
with cte AS (
SELECT t.*, case when lag(is_open) OVER (PARTITION BY store ORDER BY date) = is_open then 0 else 1 end sflag
FROM yourTable t
)
SELECT store, date, is_open, sum(sflag) over(PARTITION BY store ORDER BY date) grp
FROM cte
ORDER BY store, date;