为多个分层组优化 SUM OVER PARTITION BY
Optimizing SUM OVER PARTITION BY for several hierarchical groups
我有一个 table 如下所示:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
我需要在不同的组中找到 SUM([Spend]
,如下所示:
- 整个table中所有行的总支出
- 每个地区的总支出
- 每个地区和国家组的总支出
- 每个地区、国家和广告商组的总支出
所以我在下面写了这个查询:
SELECT
[Period]
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY [Period]) AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
但是对于仅 45 万行的 table,该查询需要 >15 分钟。我想知道是否有任何方法可以优化此性能。提前感谢您的 answers/suggestions!
在这里使用交叉应用来加快查询速度:
SELECT
periodyear
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY periodyear AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
cross apply (select YEAR([Period]) periodyear) a
你对问题的描述向我grouping sets
暗示:
SELECT YEAR([Period]) AS [Period], [Region], [Country], [Manufacturer],
SUM([Spend])
GROUP BY GROUPING SETS ( (YEAR([Period]),
(YEAR([Period]), [Region]),
(YEAR([Period]), [Region], [Country]),
(YEAR([Period]), [Region], [Country], [Manufacturer])
);
我不知道这是否会更快,但它肯定看起来更符合您的问题。
SUM() OVER()
老派:
SELECT
[Period]
, [Region]
, [Country]
, [Manufacturer]
, [Brand]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] GROUP BY [Period]) AS [SumOfSpendWorld]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region GROUP BY [Period], [Region] ) AS [SumOfSpendRegion]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country GROUP BY [Period], [Region], [Country] ) AS [SumOfSpendCountry]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country AND e.Manufacturer = t.Manufacturer GROUP BY [Period], [Region], [Country], [Manufacturer] ) AS [SumOfSpendManufacturer]
FROM myTable e
虽然这不是一个优雅的方法,但它完成了工作。我强烈建议查看 table 并对其进行分析,以了解哪种替代方法最适合您的情况。如果您觉得这是一个死胡同,那么我建议使用 temp tables 来加快速度。
例如,您可以 select 基于句点的行并使用批量复制将它们直接插入到临时 table,然后施展您的魔法。我看到 tables 迫使我使用临时 tables 而不是简单的 select 查询。其他人强迫我将 table 扩展为两个 table。
所以,它并不总是那么干净整洁!
我希望这会给您带来另一种见解,对您的旅程有所帮助。
我有一个 table 如下所示:
Region Country Manufacturer Brand Period Spend
R1 C1 M1 B1 2016 5
R1 C1 M1 B1 2017 10
R1 C1 M1 B1 2017 20
R1 C1 M1 B2 2016 15
R1 C1 M1 B3 2017 20
R1 C2 M1 B1 2017 5
R1 C2 M2 B4 2017 25
R1 C2 M2 B5 2017 30
R2 C3 M1 B1 2017 35
R2 C3 M2 B4 2017 40
R2 C3 M2 B5 2017 45
我需要在不同的组中找到 SUM([Spend]
,如下所示:
- 整个table中所有行的总支出
- 每个地区的总支出
- 每个地区和国家组的总支出
- 每个地区、国家和广告商组的总支出
所以我在下面写了这个查询:
SELECT
[Period]
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY [Period]) AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY [Period], [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
但是对于仅 45 万行的 table,该查询需要 >15 分钟。我想知道是否有任何方法可以优化此性能。提前感谢您的 answers/suggestions!
在这里使用交叉应用来加快查询速度:
SELECT
periodyear
,[Region]
,[Country]
,[Manufacturer]
,[Brand]
,SUM([Spend]) OVER (PARTITION BY periodyear AS [SumOfSpendWorld]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region]) AS [SumOfSpendRegion]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country]) AS [SumOfSpendCountry]
,SUM([Spend]) OVER (PARTITION BY periodyear, [Region], [Country], [Manufacturer]) AS [SumOfSpendManufacturer]
FROM myTable
cross apply (select YEAR([Period]) periodyear) a
你对问题的描述向我grouping sets
暗示:
SELECT YEAR([Period]) AS [Period], [Region], [Country], [Manufacturer],
SUM([Spend])
GROUP BY GROUPING SETS ( (YEAR([Period]),
(YEAR([Period]), [Region]),
(YEAR([Period]), [Region], [Country]),
(YEAR([Period]), [Region], [Country], [Manufacturer])
);
我不知道这是否会更快,但它肯定看起来更符合您的问题。
SUM() OVER()
老派:
SELECT
[Period]
, [Region]
, [Country]
, [Manufacturer]
, [Brand]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] GROUP BY [Period]) AS [SumOfSpendWorld]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region GROUP BY [Period], [Region] ) AS [SumOfSpendRegion]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country GROUP BY [Period], [Region], [Country] ) AS [SumOfSpendCountry]
, (SELECT SUM([Spend]) FROM myTable t WHERE e.[Period] = t.[Period] AND e.Region = t.Region AND e.Country = t.Country AND e.Manufacturer = t.Manufacturer GROUP BY [Period], [Region], [Country], [Manufacturer] ) AS [SumOfSpendManufacturer]
FROM myTable e
虽然这不是一个优雅的方法,但它完成了工作。我强烈建议查看 table 并对其进行分析,以了解哪种替代方法最适合您的情况。如果您觉得这是一个死胡同,那么我建议使用 temp tables 来加快速度。 例如,您可以 select 基于句点的行并使用批量复制将它们直接插入到临时 table,然后施展您的魔法。我看到 tables 迫使我使用临时 tables 而不是简单的 select 查询。其他人强迫我将 table 扩展为两个 table。
所以,它并不总是那么干净整洁!
我希望这会给您带来另一种见解,对您的旅程有所帮助。