根据具有条件的组对连续月份求和
Sum Consecutive Months Based on Groups with Criteria
我无法缩小连续几个月发生的主要地区的销售额。我知道我需要将某种形式的 window 函数与 Row_Number
或 Dense_Rank
一起使用,但我无法获得最终输出
这是我的源数据:
+--------+-----------+------------+
| Fruit | SaleDate | Top_Region |
+--------+-----------+------------+
| Apple | 1/1/2017 | 1 |
| Apple | 2/1/2017 | 1 |
| Apple | 3/1/2017 | 1 |
| Apple | 4/1/2017 | 0 |
| Apple | 5/1/2017 | 0 |
| Apple | 6/1/2017 | 0 |
| Apple | 7/1/2017 | 1 |
| Apple | 8/1/2017 | 1 |
| Apple | 9/1/2017 | 1 |
| Apple | 10/1/2017 | 1 |
| Apple | 11/1/2017 | 0 |
| Apple | 12/1/2017 | 0 |
| Banana | 1/1/2017 | 0 |
| Banana | 2/1/2017 | 0 |
| Banana | 3/1/2017 | 1 |
| Banana | 4/1/2017 | 1 |
| Banana | 5/1/2017 | 1 |
| Banana | 6/1/2017 | 1 |
| Banana | 7/1/2017 | 1 |
| Banana | 8/1/2017 | 1 |
| Banana | 9/1/2017 | 0 |
| Banana | 10/1/2017 | 1 |
| Banana | 11/1/2017 | 1 |
| Banana | 12/1/2017 | 0 |
+--------+-----------+------------+
这是预期的输出:
+--------+-----------+-----------+-------+
| Fruit | Start | End | Total |
+--------+-----------+-----------+-------+
| Apple | 1/1/2017 | 3/1/2017 | 3 |
| Apple | 7/1/2017 | 10/1/2017 | 4 |
| Banana | 3/1/2017 | 8/1/2017 | 6 |
| Banana | 10/1/2017 | 11/1/2017 | 2 |
+--------+-----------+-----------+-------+
目标是在一个月内连续出现销量第一的区域,但没有出现。
到目前为止,我已经尝试了几种不同的组合,这是最接近的。
SELECT fruit,
MIN(saledate) AS spanStart ,
MAX(saledate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT s.* ,
( ROW_NUMBER() OVER ( ORDER BY month )
- ROW_NUMBER() OVER ( PARTITION BY fruit, topregion ORDER BY month ) ) AS fruits
FROM #salesdata s
) s
GROUP BY fruit,fruits ,
topregion
HAVING topregion = 1
ORDER BY COUNT(*) DESC;
如有任何帮助,我们将不胜感激
这是一个典型的缺口孤岛问题。一种策略是通过计算两个 row_number()
之间的差异来识别相邻行组的组。然后我们可以过滤具有 top_region = 1
的组并使用聚合来获取开始日期、结束日期和每个组的记录数。
您的查询非常接近,但第一个 row_number()
在其 over()
子句中缺少 partition by fruit
。而且我发现给另一列称为 fruit
的列 fruits
取别名很容易出错。
select
fruit,
min(sale_date) start_date,
max(sale_date) end_date,
count(*) total
from (
select
t.*,
row_number() over(partition by fruit order by sale_date) rn1,
row_number() over(partition by fruit, top_region order by sale_date) rn2
from mytable t
) t
where top_region = 1
group by fruit, rn1 - rn2
order by fruit, start_date
您可以 运行 单独的内部查询来查看它产生的结果。
fruit | start_date | end_date | total
:----- | :--------- | :--------- | ----:
Apple | 2017-01-01 | 2017-01-03 | 3
Apple | 2017-01-07 | 2017-01-10 | 4
Banana | 2017-01-03 | 2017-01-08 | 6
Banana | 2017-01-10 | 2017-01-11 | 2
我无法缩小连续几个月发生的主要地区的销售额。我知道我需要将某种形式的 window 函数与 Row_Number
或 Dense_Rank
一起使用,但我无法获得最终输出
这是我的源数据:
+--------+-----------+------------+
| Fruit | SaleDate | Top_Region |
+--------+-----------+------------+
| Apple | 1/1/2017 | 1 |
| Apple | 2/1/2017 | 1 |
| Apple | 3/1/2017 | 1 |
| Apple | 4/1/2017 | 0 |
| Apple | 5/1/2017 | 0 |
| Apple | 6/1/2017 | 0 |
| Apple | 7/1/2017 | 1 |
| Apple | 8/1/2017 | 1 |
| Apple | 9/1/2017 | 1 |
| Apple | 10/1/2017 | 1 |
| Apple | 11/1/2017 | 0 |
| Apple | 12/1/2017 | 0 |
| Banana | 1/1/2017 | 0 |
| Banana | 2/1/2017 | 0 |
| Banana | 3/1/2017 | 1 |
| Banana | 4/1/2017 | 1 |
| Banana | 5/1/2017 | 1 |
| Banana | 6/1/2017 | 1 |
| Banana | 7/1/2017 | 1 |
| Banana | 8/1/2017 | 1 |
| Banana | 9/1/2017 | 0 |
| Banana | 10/1/2017 | 1 |
| Banana | 11/1/2017 | 1 |
| Banana | 12/1/2017 | 0 |
+--------+-----------+------------+
这是预期的输出:
+--------+-----------+-----------+-------+
| Fruit | Start | End | Total |
+--------+-----------+-----------+-------+
| Apple | 1/1/2017 | 3/1/2017 | 3 |
| Apple | 7/1/2017 | 10/1/2017 | 4 |
| Banana | 3/1/2017 | 8/1/2017 | 6 |
| Banana | 10/1/2017 | 11/1/2017 | 2 |
+--------+-----------+-----------+-------+
目标是在一个月内连续出现销量第一的区域,但没有出现。
到目前为止,我已经尝试了几种不同的组合,这是最接近的。
SELECT fruit,
MIN(saledate) AS spanStart ,
MAX(saledate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT s.* ,
( ROW_NUMBER() OVER ( ORDER BY month )
- ROW_NUMBER() OVER ( PARTITION BY fruit, topregion ORDER BY month ) ) AS fruits
FROM #salesdata s
) s
GROUP BY fruit,fruits ,
topregion
HAVING topregion = 1
ORDER BY COUNT(*) DESC;
如有任何帮助,我们将不胜感激
这是一个典型的缺口孤岛问题。一种策略是通过计算两个 row_number()
之间的差异来识别相邻行组的组。然后我们可以过滤具有 top_region = 1
的组并使用聚合来获取开始日期、结束日期和每个组的记录数。
您的查询非常接近,但第一个 row_number()
在其 over()
子句中缺少 partition by fruit
。而且我发现给另一列称为 fruit
的列 fruits
取别名很容易出错。
select
fruit,
min(sale_date) start_date,
max(sale_date) end_date,
count(*) total
from (
select
t.*,
row_number() over(partition by fruit order by sale_date) rn1,
row_number() over(partition by fruit, top_region order by sale_date) rn2
from mytable t
) t
where top_region = 1
group by fruit, rn1 - rn2
order by fruit, start_date
您可以 运行 单独的内部查询来查看它产生的结果。
fruit | start_date | end_date | total :----- | :--------- | :--------- | ----: Apple | 2017-01-01 | 2017-01-03 | 3 Apple | 2017-01-07 | 2017-01-10 | 4 Banana | 2017-01-03 | 2017-01-08 | 6 Banana | 2017-01-10 | 2017-01-11 | 2