如何按 "all others" 和总计的前 N 个类别进行汇总?
How can I aggregate by the top N categories with an "all others" and totals?
我有 table 按类别列出用户的销售(每个销售至少有一个类别,可能有多个类别)。
我可以获得用户的热门类别,但我需要 both his/her 前 N 个类别和其余类别的用户统计信息。
我已将问题归结为 MCVE 如下...
MCVE Data Summary:
Salesman SaleID Amount Categories
-------- ------ ------ ------------------------------
1 1 2 Service
2 2 2 Software, Support_Contract
2 3 3 Service
2 4 1 Parts, Service, Software
2 5 3 Support_Contract
2 6 4 Promo_Gift, Support_Contract
2 7 -2 Rebate, Support_Contract
3 8 2 Software, Support_Contract
3 9 3 Service
3 10 1 Parts, Software
3 11 3 Support_Contract
3 12 4 Promo_Gift, Support_Contract
3 13 -2 Rebate, Support_Contract
MCVE 设置 SQL:
CREATE TABLE Sales ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags ([SaleID] int, [TagId] int);
CREATE TABLE Tags ([TagId] int, [TagName] varchar(100) );
INSERT INTO Sales
([Salesman], [SaleID], [Amount])
VALUES
(1, 1, 2), (2, 6, 4), (3, 10, 1),
(2, 2, 2), (2, 7, -2), (3, 11, 3),
(2, 3, 3), (3, 8, 2), (3, 12, 4),
(2, 4, 1), (3, 9, 3), (3, 13, -2),
(2, 5, 3)
;
INSERT INTO SalesTags
([SaleID], [TagId])
VALUES
(1, 3), (6, 4), (10, 1),
(2, 1), (6, 5), (10, 2),
(2, 4), (7, 4), (11, 4),
(3, 3), (7, 6), (12, 4),
(4, 1), (8, 1), (12, 5),
(4, 2), (8, 4), (13, 4),
(4, 3), (9, 3), (13, 6),
(5, 4)
;
INSERT INTO Tags
([TagId], [TagName])
VALUES
(1, 'Software'),
(2, 'Parts'),
(3, 'Service'),
(4, 'Support_Contract'),
(5, 'Promo_Gift'),
(6, 'Rebate')
;
看到this SQL Fiddle,我可以得到用户的前N个标签,如:
WITH usersSales AS ( -- actual base CTE is much more complex
SELECT s.SaleID
, s.Amount
FROM Sales s
WHERE s.Salesman = 2
)
SELECT Top 3 -- N can be 3 to 10
t.TagName
, COUNT (us.SaleID) AS tagSales
, SUM (us.Amount) AS tagAmount
FROM usersSales us
INNER JOIN SalesTags st ON st.SaleID = us.SaleID
INNER JOIN Tags t ON t.TagId = st.TagId
GROUP BY t.TagName
ORDER BY tagAmount DESC
, tagSales DESC
, t.TagName
-- 显示用户最喜欢的类别是:
- "Support_Contract"
- "Service"
- "Promo_Gift"
按此顺序,用户 2。(以及 Support_Contract、Promo_Gift、用户 3 的软件。)
但是对于 N=3,需要的结果 是:
用户 2:
Top Category Amount Number of Sales
---------------- ------ ---------------
Support Contract 7 4
Service 4 2
Promo Gift 0 0
- All Others - 0 0
============================================
Totals 11 6
用户 3:
Top Category Amount Number of Sales
---------------- ------ ---------------
Support Contract 7 4
Promo_Gift 0 0
Software 1 1
- All Others - 3 1
============================================
Totals 11 6
其中:
- 热门类别 是给定销售中用户排名最高的类别(根据上述查询)。
- 第 2 行的热门类别 不包括第 1 行中已占的销售额。
- 第 3 行的热门类别 不包括第 1 行和第 2 行中已经包含的销售额。
- 等等
- 所有未计入前 N 个类别的剩余销售额都归入
- All Others -
组。
- 底部的总计与用户的总体销售数据相匹配。
如何汇总这样的结果?
请注意,这是 MS SQL-Server 2017 上的 运行,我无法更改 table 架构。
这是一种方法。
运行 逐步查询,逐个 CTE 并检查中间结果以了解其工作原理。
这不是最有效的方法,因为我最终将 table 加入到自身中以消除之前汇总的销售额,但目前我无法弄清楚如何避免它。
WITH usersSales
AS
( -- actual base CTE is much more complex
SELECT
s.SaleID
, s.Amount
FROM Sales s
WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
SELECT
t.TagName
,us.Amount
,us.SaleID
,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
FROM
usersSales us
INNER JOIN SalesTags st ON st.SaleID = us.SaleID
INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
SELECT
TagName
,Amount
,SaleID
,TagAmount
,TagSales
,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
FROM CTE_Sums
)
,CTE_Final
AS
(
SELECT
Main.TagName
,Main.Amount
,Main.SaleID
,Main.TagAmount
,Main.TagSales
,Main.rnk
,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
,A.FinalTagSales
FROM
CTE_Rank AS Main
OUTER APPLY
(
SELECT
SUM(Detail.Amount) AS FinalTagAmount
,COUNT(*) AS FinalTagSales
FROM CTE_Rank AS Detail
WHERE
Detail.rnk = Main.rnk
AND Detail.SaleID NOT IN
(
SELECT PrevRanks.SaleID
FROM CTE_Rank AS PrevRanks
WHERE PrevRanks.rnk < Detail.rnk
)
) AS A
)
SELECT
TagName
,MIN(FinalTagAmount) AS FinalTagAmount
,MIN(FinalTagSales) AS FinalTagSales
,rnk
,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
TagName
,rnk
UNION ALL
SELECT
'- All Others -' AS TagName
,SUM(FinalTagAmount) AS FinalTagAmount
,SUM(FinalTagSales) AS FinalTagSales
,0 AS rnk
,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3
ORDER BY
SortOrder
,rnk
;
CTE_Rank
暂时不要对行进行分组和汇总,而是使用window 聚合来获取每个标签的排名。稍后我们需要单独的行 (SaleID
) 和单独的金额来过滤正在使用的那些。
+------------------+--------+--------+-----------+----------+-----+
| TagName | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract | -2 | 7 | 7 | 4 | 1 |
| Support Contract | 3 | 5 | 7 | 4 | 1 |
| Support Contract | 4 | 6 | 7 | 4 | 1 |
| Support Contract | 2 | 2 | 7 | 4 | 1 |
| Service | 1 | 4 | 4 | 2 | 2 |
| Service | 3 | 3 | 4 | 2 | 2 |
| Promo Gift | 4 | 6 | 4 | 1 | 3 |
| Software | 1 | 4 | 3 | 2 | 4 |
| Software | 2 | 2 | 3 | 2 | 4 |
| Parts | 1 | 4 | 1 | 1 | 5 |
| Rebate | -2 | 7 | -2 | 1 | 6 |
+------------------+--------+--------+-----------+----------+-----+
CTE_Final
OUTER APPLY
通过过滤排名较高的标签中遇到的那些销售进行主要计算。
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| TagName | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract | -2 | 7 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 3 | 5 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 4 | 6 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 2 | 2 | 7 | 4 | 1 | 7 | 4 |
| Service | 1 | 4 | 4 | 2 | 2 | 4 | 2 |
| Service | 3 | 3 | 4 | 2 | 2 | 4 | 2 |
| Promo Gift | 4 | 6 | 4 | 1 | 3 | 0 | 0 |
| Software | 1 | 4 | 3 | 2 | 4 | 0 | 0 |
| Software | 2 | 2 | 3 | 2 | 4 | 0 | 0 |
| Parts | 1 | 4 | 1 | 1 | 5 | 0 | 0 |
| Rebate | -2 | 7 | -2 | 1 | 6 | 0 | 0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
查询结果
简单地将排名前 3 的标签加上所有其他标签放在一起。
+------------------+----------------+---------------+-----+-----------+
| TagName | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract | 7 | 4 | 1 | 0 |
| Service | 4 | 2 | 2 | 0 |
| Promo Gift | 0 | 0 | 3 | 0 |
| - All Others - | 0 | 0 | 0 | 1 |
+------------------+----------------+---------------+-----+-----------+
我有 table 按类别列出用户的销售(每个销售至少有一个类别,可能有多个类别)。
我可以获得用户的热门类别,但我需要 both his/her 前 N 个类别和其余类别的用户统计信息。
我已将问题归结为 MCVE 如下...
MCVE Data Summary:
Salesman SaleID Amount Categories -------- ------ ------ ------------------------------ 1 1 2 Service 2 2 2 Software, Support_Contract 2 3 3 Service 2 4 1 Parts, Service, Software 2 5 3 Support_Contract 2 6 4 Promo_Gift, Support_Contract 2 7 -2 Rebate, Support_Contract 3 8 2 Software, Support_Contract 3 9 3 Service 3 10 1 Parts, Software 3 11 3 Support_Contract 3 12 4 Promo_Gift, Support_Contract 3 13 -2 Rebate, Support_Contract
MCVE 设置 SQL:
CREATE TABLE Sales ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags ([SaleID] int, [TagId] int);
CREATE TABLE Tags ([TagId] int, [TagName] varchar(100) );
INSERT INTO Sales
([Salesman], [SaleID], [Amount])
VALUES
(1, 1, 2), (2, 6, 4), (3, 10, 1),
(2, 2, 2), (2, 7, -2), (3, 11, 3),
(2, 3, 3), (3, 8, 2), (3, 12, 4),
(2, 4, 1), (3, 9, 3), (3, 13, -2),
(2, 5, 3)
;
INSERT INTO SalesTags
([SaleID], [TagId])
VALUES
(1, 3), (6, 4), (10, 1),
(2, 1), (6, 5), (10, 2),
(2, 4), (7, 4), (11, 4),
(3, 3), (7, 6), (12, 4),
(4, 1), (8, 1), (12, 5),
(4, 2), (8, 4), (13, 4),
(4, 3), (9, 3), (13, 6),
(5, 4)
;
INSERT INTO Tags
([TagId], [TagName])
VALUES
(1, 'Software'),
(2, 'Parts'),
(3, 'Service'),
(4, 'Support_Contract'),
(5, 'Promo_Gift'),
(6, 'Rebate')
;
看到this SQL Fiddle,我可以得到用户的前N个标签,如:
WITH usersSales AS ( -- actual base CTE is much more complex
SELECT s.SaleID
, s.Amount
FROM Sales s
WHERE s.Salesman = 2
)
SELECT Top 3 -- N can be 3 to 10
t.TagName
, COUNT (us.SaleID) AS tagSales
, SUM (us.Amount) AS tagAmount
FROM usersSales us
INNER JOIN SalesTags st ON st.SaleID = us.SaleID
INNER JOIN Tags t ON t.TagId = st.TagId
GROUP BY t.TagName
ORDER BY tagAmount DESC
, tagSales DESC
, t.TagName
-- 显示用户最喜欢的类别是:
- "Support_Contract"
- "Service"
- "Promo_Gift"
按此顺序,用户 2。(以及 Support_Contract、Promo_Gift、用户 3 的软件。)
但是对于 N=3,需要的结果 是:
用户 2:
Top Category Amount Number of Sales ---------------- ------ --------------- Support Contract 7 4 Service 4 2 Promo Gift 0 0 - All Others - 0 0 ============================================ Totals 11 6
用户 3:
Top Category Amount Number of Sales ---------------- ------ --------------- Support Contract 7 4 Promo_Gift 0 0 Software 1 1 - All Others - 3 1 ============================================ Totals 11 6
其中:
- 热门类别 是给定销售中用户排名最高的类别(根据上述查询)。
- 第 2 行的热门类别 不包括第 1 行中已占的销售额。
- 第 3 行的热门类别 不包括第 1 行和第 2 行中已经包含的销售额。
- 等等
- 所有未计入前 N 个类别的剩余销售额都归入
- All Others -
组。 - 底部的总计与用户的总体销售数据相匹配。
如何汇总这样的结果?
请注意,这是 MS SQL-Server 2017 上的 运行,我无法更改 table 架构。
这是一种方法。 运行 逐步查询,逐个 CTE 并检查中间结果以了解其工作原理。
这不是最有效的方法,因为我最终将 table 加入到自身中以消除之前汇总的销售额,但目前我无法弄清楚如何避免它。
WITH usersSales
AS
( -- actual base CTE is much more complex
SELECT
s.SaleID
, s.Amount
FROM Sales s
WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
SELECT
t.TagName
,us.Amount
,us.SaleID
,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
FROM
usersSales us
INNER JOIN SalesTags st ON st.SaleID = us.SaleID
INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
SELECT
TagName
,Amount
,SaleID
,TagAmount
,TagSales
,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
FROM CTE_Sums
)
,CTE_Final
AS
(
SELECT
Main.TagName
,Main.Amount
,Main.SaleID
,Main.TagAmount
,Main.TagSales
,Main.rnk
,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
,A.FinalTagSales
FROM
CTE_Rank AS Main
OUTER APPLY
(
SELECT
SUM(Detail.Amount) AS FinalTagAmount
,COUNT(*) AS FinalTagSales
FROM CTE_Rank AS Detail
WHERE
Detail.rnk = Main.rnk
AND Detail.SaleID NOT IN
(
SELECT PrevRanks.SaleID
FROM CTE_Rank AS PrevRanks
WHERE PrevRanks.rnk < Detail.rnk
)
) AS A
)
SELECT
TagName
,MIN(FinalTagAmount) AS FinalTagAmount
,MIN(FinalTagSales) AS FinalTagSales
,rnk
,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
TagName
,rnk
UNION ALL
SELECT
'- All Others -' AS TagName
,SUM(FinalTagAmount) AS FinalTagAmount
,SUM(FinalTagSales) AS FinalTagSales
,0 AS rnk
,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3
ORDER BY
SortOrder
,rnk
;
CTE_Rank
暂时不要对行进行分组和汇总,而是使用window 聚合来获取每个标签的排名。稍后我们需要单独的行 (SaleID
) 和单独的金额来过滤正在使用的那些。
+------------------+--------+--------+-----------+----------+-----+
| TagName | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract | -2 | 7 | 7 | 4 | 1 |
| Support Contract | 3 | 5 | 7 | 4 | 1 |
| Support Contract | 4 | 6 | 7 | 4 | 1 |
| Support Contract | 2 | 2 | 7 | 4 | 1 |
| Service | 1 | 4 | 4 | 2 | 2 |
| Service | 3 | 3 | 4 | 2 | 2 |
| Promo Gift | 4 | 6 | 4 | 1 | 3 |
| Software | 1 | 4 | 3 | 2 | 4 |
| Software | 2 | 2 | 3 | 2 | 4 |
| Parts | 1 | 4 | 1 | 1 | 5 |
| Rebate | -2 | 7 | -2 | 1 | 6 |
+------------------+--------+--------+-----------+----------+-----+
CTE_Final
OUTER APPLY
通过过滤排名较高的标签中遇到的那些销售进行主要计算。
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| TagName | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract | -2 | 7 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 3 | 5 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 4 | 6 | 7 | 4 | 1 | 7 | 4 |
| Support Contract | 2 | 2 | 7 | 4 | 1 | 7 | 4 |
| Service | 1 | 4 | 4 | 2 | 2 | 4 | 2 |
| Service | 3 | 3 | 4 | 2 | 2 | 4 | 2 |
| Promo Gift | 4 | 6 | 4 | 1 | 3 | 0 | 0 |
| Software | 1 | 4 | 3 | 2 | 4 | 0 | 0 |
| Software | 2 | 2 | 3 | 2 | 4 | 0 | 0 |
| Parts | 1 | 4 | 1 | 1 | 5 | 0 | 0 |
| Rebate | -2 | 7 | -2 | 1 | 6 | 0 | 0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
查询结果
简单地将排名前 3 的标签加上所有其他标签放在一起。
+------------------+----------------+---------------+-----+-----------+
| TagName | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract | 7 | 4 | 1 | 0 |
| Service | 4 | 2 | 2 | 0 |
| Promo Gift | 0 | 0 | 3 | 0 |
| - All Others - | 0 | 0 | 0 | 1 |
+------------------+----------------+---------------+-----+-----------+