如何解决嵌套聚合函数错误?
How to resolve nested aggregate function error?
我使用 case 函数对项目的聚合计数进行存储,并希望将所有计数的总和划分为每个存储桶的值(希望将每个存储桶显示为总数的百分比)。但是,我收到无法嵌套聚合函数的错误,我理解这一点,但需要一些帮助来找到替代解决方案来实现我的目标。
错误:
Aggregate functions cannot be nested: [COUNT("values".CASE_AGE_CATEGORY)] nested in [SUM(COUNT("values".CASE_AGE_CATEGORY))]
代码:
SELECT Case_Age_Category, COUNT(Case_Age_Category)/sum(count(Case_Age_Category)) as Volume
FROM
(
SELECT DISTINCT(c.CASE_ID),c.CLOSED_AT,
CASE
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >0
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <24 then '0-24 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >24
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <48 then '24-48 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >48
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <72 then '48-72 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >72
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <96 then '72-96 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >96
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <120 then '96-120 HOURS'
else '>5 DAYS'
End as Case_Age_Category
FROM TEST_DB.STAGING.DW_DECISIV_CASES c inner join DB.Seed.DEALER_MAPPING d on c.DEALER_ID = d.DECISIVDEALERID
WHERE d.DIVISION = 'K'
and RO_NUMBER is not NULL
and (d.DEALERCATEGORY ILIKE 'DEALER' OR d.DEALERCATEGORY ILIKE 'RTC')
and d.DEALERUSAGE ILIKE 'PRODUCTION'
and d.OWNERGROUPCODE !='S040'
)
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC
当前输出截图:
looking to have each bucket shown as a % of the total
将 COUNT(...) 与窗口化 SUM() OVER() 相结合以获得所有组的总数:
SELECT Case_Age_Category,
DIV0(COUNT(Case_Age_Category), SUM(COUNT(Case_Age_Category)) OVER()) as Volume
FROM
(
-- ...
) sub
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC
您的时间分桶有一些问题。
- 您不会捕获负时数或零时数,因此这些将被归类为“>5”
- 你在任何地方都不符合 24、48、72、96 小时
- 2 秒的时差,
TIMEDIFF
标记为 1 小时可能不是问题
SELECT
ASSET_CHECKED_IN_AT,
STATUS_CHANGED_TO_COMPLETE_HERE_AT,
TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
CASE
WHEN hour_diff >0 AND hour_diff <24 then '0-24 HOURS'
WHEN hour_diff >24 AND hour_diff <48 then '24-48 HOURS'
WHEN hour_diff >48 AND hour_diff <72 then '48-72 HOURS'
WHEN hour_diff >72 AND hour_diff <96 then '72-96 HOURS'
WHEN hour_diff >96 AND hour_diff <120 then '96-120 HOURS'
else '>5 DAYS'
End as Case_Age_Category,
(date_part(epoch_second, STATUS_CHANGED_TO_COMPLETE_HERE_AT::timestamp_ntz)-date_part(epoch_second, ASSET_CHECKED_IN_AT::timestamp_ntz))/3600 as hour_diff_2
FROM VALUES
('2021-01-23 13:45:00','2021-01-23 13:45:00'),
('2021-01-23 13:45:00','2021-01-23 14:40:00'),
('2021-01-23 13:45:00','2021-01-23 14:44:00'),
('2021-01-23 13:59:59','2021-01-23 14:00:01'),
('2021-01-23 13:45:00','2021-01-24 13:45:00'),
('2021-01-23 13:45:00','2021-01-27 13:45:00'),
('2021-01-24 13:45:00','2021-01-23 13:45:00')
v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);
给出:
ASSET_CHECKED_IN_AT
STATUS_CHANGED_TO_COMPLETE_HERE_AT
HOUR_DIFF
CASE_AGE_CATEGORY
HOUR_DIFF_2
2021-01-23 13:45:00
2021-01-23 13:45:00
0
>5 DAYS
0
2021-01-23 13:45:00
2021-01-23 14:40:00
1
0-24 HOURS
0.916667
2021-01-23 13:45:00
2021-01-23 14:44:00
1
0-24 HOURS
0.983333
2021-01-23 13:59:59
2021-01-23 14:00:01
1
0-24 HOURS
0.000556
2021-01-23 13:45:00
2021-01-24 13:45:00
24
>5 DAYS
24
2021-01-23 13:45:00
2021-01-27 13:45:00
96
>5 DAYS
96
2021-01-24 13:45:00
2021-01-23 13:45:00
-24
>5 DAYS
-24
接下来在 CASE 中评估 WHEN 子句的顺序是呈现的顺序,因此您可以使用它来减少一半的评估,并且如果您将时差作为顶级列,您 SQL 更具可读性。
鉴于这将进入 SQL 的另一层,额外的列是有用的,是使 SQL 更具可读性的一种相当不错的方法。
SELECT
ASSET_CHECKED_IN_AT,
STATUS_CHANGED_TO_COMPLETE_HERE_AT,
TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
CASE
WHEN hour_diff <0 then '-ve HOURS'
WHEN hour_diff <24 then '0-24 HOURS'
WHEN hour_diff <48 then '24-48 HOURS'
WHEN hour_diff <72 then '48-72 HOURS'
WHEN hour_diff <96 then '72-96 HOURS'
WHEN hour_diff <120 then '96-120 HOURS'
else '>=5 DAYS'
End as Case_Age_Category
FROM VALUES
('2021-01-23 13:45:00','2021-01-23 13:45:00'),
('2021-01-23 13:45:00','2021-01-23 14:40:00'),
('2021-01-23 13:45:00','2021-01-23 14:44:00'),
('2021-01-23 13:59:59','2021-01-23 14:00:01'),
('2021-01-23 13:45:00','2021-01-24 13:45:00'),
('2021-01-23 13:45:00','2021-01-27 13:45:00'),
('2021-01-24 13:45:00','2021-01-23 13:45:00')
v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);
ASSET_CHECKED_IN_AT
STATUS_CHANGED_TO_COMPLETE_HERE_AT
HOUR_DIFF
CASE_AGE_CATEGORY
2021-01-23 13:45:00
2021-01-23 13:45:00
0
0-24 HOURS
2021-01-23 13:45:00
2021-01-23 14:40:00
1
0-24 HOURS
2021-01-23 13:45:00
2021-01-23 14:44:00
1
0-24 HOURS
2021-01-23 13:59:59
2021-01-23 14:00:01
1
0-24 HOURS
2021-01-23 13:45:00
2021-01-24 13:45:00
24
24-48 HOURS
2021-01-23 13:45:00
2021-01-27 13:45:00
96
96-120 HOURS
2021-01-24 13:45:00
2021-01-23 13:45:00
-24
-ve HOURS
现在介绍另一种计算百分比的方法,RATIO_TO_REPORT
所以你可以使用 Lukasz 展示的 DIV0 方法,我在下面对其进行了解构,但是 RETIO_TO_REPORT 进行了提升,我以多种形式展示它以表明它对 deconstructed/composed 很满意版本:
With data AS (
SELECT *
FROM VALUES
('2021-01-23 13:45:00','2021-01-23 13:45:00'),
('2021-01-23 13:45:00','2021-01-23 14:40:00'),
('2021-01-23 13:45:00','2021-01-23 14:44:00'),
('2021-01-23 13:59:59','2021-01-23 14:00:01'),
('2021-01-23 13:45:00','2021-01-24 13:45:00'),
('2021-01-23 13:45:00','2021-01-27 13:45:00'),
('2021-01-24 13:45:00','2021-01-23 13:45:00')
v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT)
), cat_data AS (
SELECT
ASSET_CHECKED_IN_AT,
STATUS_CHANGED_TO_COMPLETE_HERE_AT,
TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
CASE
WHEN hour_diff <0 then '-ve HOURS'
WHEN hour_diff <24 then '0-24 HOURS'
WHEN hour_diff <48 then '24-48 HOURS'
WHEN hour_diff <72 then '48-72 HOURS'
WHEN hour_diff <96 then '72-96 HOURS'
WHEN hour_diff <120 then '96-120 HOURS'
else '>=5 DAYS'
End as Case_Age_Category
FROM data
)
SELECT Case_Age_Category
,COUNT(1) as cat_count
,SUM(cat_count) OVER() as total_count_a
,DIV0(cat_count, total_count_a) as percentage
,RATIO_TO_REPORT(cat_count) over () as percentage_2
,RATIO_TO_REPORT(COUNT(1)) over () as percentage_3
,RATIO_TO_REPORT(COUNT(Case_Age_Category)) over () as percentage_4
FROM cat_data
GROUP BY 1;
给出:
CASE_AGE_CATEGORY
CAT_COUNT
TOTAL_COUNT_A
PERCENTAGE
PERCENTAGE_2
PERCENTAGE_3
PERCENTAGE_4
0-24 HOURS
4
7
0.571428
0.571429
0.571429
0.571429
24-48 HOURS
1
7
0.142857
0.142857
0.142857
0.142857
-ve HOURS
1
7
0.142857
0.142857
0.142857
0.142857
96-120 HOURS
1
7
0.142857
0.142857
0.142857
0.142857
我使用 case 函数对项目的聚合计数进行存储,并希望将所有计数的总和划分为每个存储桶的值(希望将每个存储桶显示为总数的百分比)。但是,我收到无法嵌套聚合函数的错误,我理解这一点,但需要一些帮助来找到替代解决方案来实现我的目标。
错误:
Aggregate functions cannot be nested: [COUNT("values".CASE_AGE_CATEGORY)] nested in [SUM(COUNT("values".CASE_AGE_CATEGORY))]
代码:
SELECT Case_Age_Category, COUNT(Case_Age_Category)/sum(count(Case_Age_Category)) as Volume
FROM
(
SELECT DISTINCT(c.CASE_ID),c.CLOSED_AT,
CASE
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >0
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <24 then '0-24 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >24
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <48 then '24-48 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >48
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <72 then '48-72 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >72
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <96 then '72-96 HOURS'
WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >96
AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <120 then '96-120 HOURS'
else '>5 DAYS'
End as Case_Age_Category
FROM TEST_DB.STAGING.DW_DECISIV_CASES c inner join DB.Seed.DEALER_MAPPING d on c.DEALER_ID = d.DECISIVDEALERID
WHERE d.DIVISION = 'K'
and RO_NUMBER is not NULL
and (d.DEALERCATEGORY ILIKE 'DEALER' OR d.DEALERCATEGORY ILIKE 'RTC')
and d.DEALERUSAGE ILIKE 'PRODUCTION'
and d.OWNERGROUPCODE !='S040'
)
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC
当前输出截图:
looking to have each bucket shown as a % of the total
将 COUNT(...) 与窗口化 SUM() OVER() 相结合以获得所有组的总数:
SELECT Case_Age_Category,
DIV0(COUNT(Case_Age_Category), SUM(COUNT(Case_Age_Category)) OVER()) as Volume
FROM
(
-- ...
) sub
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC
您的时间分桶有一些问题。
- 您不会捕获负时数或零时数,因此这些将被归类为“>5”
- 你在任何地方都不符合 24、48、72、96 小时
- 2 秒的时差,
TIMEDIFF
标记为 1 小时可能不是问题
SELECT
ASSET_CHECKED_IN_AT,
STATUS_CHANGED_TO_COMPLETE_HERE_AT,
TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
CASE
WHEN hour_diff >0 AND hour_diff <24 then '0-24 HOURS'
WHEN hour_diff >24 AND hour_diff <48 then '24-48 HOURS'
WHEN hour_diff >48 AND hour_diff <72 then '48-72 HOURS'
WHEN hour_diff >72 AND hour_diff <96 then '72-96 HOURS'
WHEN hour_diff >96 AND hour_diff <120 then '96-120 HOURS'
else '>5 DAYS'
End as Case_Age_Category,
(date_part(epoch_second, STATUS_CHANGED_TO_COMPLETE_HERE_AT::timestamp_ntz)-date_part(epoch_second, ASSET_CHECKED_IN_AT::timestamp_ntz))/3600 as hour_diff_2
FROM VALUES
('2021-01-23 13:45:00','2021-01-23 13:45:00'),
('2021-01-23 13:45:00','2021-01-23 14:40:00'),
('2021-01-23 13:45:00','2021-01-23 14:44:00'),
('2021-01-23 13:59:59','2021-01-23 14:00:01'),
('2021-01-23 13:45:00','2021-01-24 13:45:00'),
('2021-01-23 13:45:00','2021-01-27 13:45:00'),
('2021-01-24 13:45:00','2021-01-23 13:45:00')
v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);
给出:
ASSET_CHECKED_IN_AT | STATUS_CHANGED_TO_COMPLETE_HERE_AT | HOUR_DIFF | CASE_AGE_CATEGORY | HOUR_DIFF_2 |
---|---|---|---|---|
2021-01-23 13:45:00 | 2021-01-23 13:45:00 | 0 | >5 DAYS | 0 |
2021-01-23 13:45:00 | 2021-01-23 14:40:00 | 1 | 0-24 HOURS | 0.916667 |
2021-01-23 13:45:00 | 2021-01-23 14:44:00 | 1 | 0-24 HOURS | 0.983333 |
2021-01-23 13:59:59 | 2021-01-23 14:00:01 | 1 | 0-24 HOURS | 0.000556 |
2021-01-23 13:45:00 | 2021-01-24 13:45:00 | 24 | >5 DAYS | 24 |
2021-01-23 13:45:00 | 2021-01-27 13:45:00 | 96 | >5 DAYS | 96 |
2021-01-24 13:45:00 | 2021-01-23 13:45:00 | -24 | >5 DAYS | -24 |
接下来在 CASE 中评估 WHEN 子句的顺序是呈现的顺序,因此您可以使用它来减少一半的评估,并且如果您将时差作为顶级列,您 SQL 更具可读性。
鉴于这将进入 SQL 的另一层,额外的列是有用的,是使 SQL 更具可读性的一种相当不错的方法。
SELECT
ASSET_CHECKED_IN_AT,
STATUS_CHANGED_TO_COMPLETE_HERE_AT,
TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
CASE
WHEN hour_diff <0 then '-ve HOURS'
WHEN hour_diff <24 then '0-24 HOURS'
WHEN hour_diff <48 then '24-48 HOURS'
WHEN hour_diff <72 then '48-72 HOURS'
WHEN hour_diff <96 then '72-96 HOURS'
WHEN hour_diff <120 then '96-120 HOURS'
else '>=5 DAYS'
End as Case_Age_Category
FROM VALUES
('2021-01-23 13:45:00','2021-01-23 13:45:00'),
('2021-01-23 13:45:00','2021-01-23 14:40:00'),
('2021-01-23 13:45:00','2021-01-23 14:44:00'),
('2021-01-23 13:59:59','2021-01-23 14:00:01'),
('2021-01-23 13:45:00','2021-01-24 13:45:00'),
('2021-01-23 13:45:00','2021-01-27 13:45:00'),
('2021-01-24 13:45:00','2021-01-23 13:45:00')
v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);
ASSET_CHECKED_IN_AT | STATUS_CHANGED_TO_COMPLETE_HERE_AT | HOUR_DIFF | CASE_AGE_CATEGORY |
---|---|---|---|
2021-01-23 13:45:00 | 2021-01-23 13:45:00 | 0 | 0-24 HOURS |
2021-01-23 13:45:00 | 2021-01-23 14:40:00 | 1 | 0-24 HOURS |
2021-01-23 13:45:00 | 2021-01-23 14:44:00 | 1 | 0-24 HOURS |
2021-01-23 13:59:59 | 2021-01-23 14:00:01 | 1 | 0-24 HOURS |
2021-01-23 13:45:00 | 2021-01-24 13:45:00 | 24 | 24-48 HOURS |
2021-01-23 13:45:00 | 2021-01-27 13:45:00 | 96 | 96-120 HOURS |
2021-01-24 13:45:00 | 2021-01-23 13:45:00 | -24 | -ve HOURS |
现在介绍另一种计算百分比的方法,RATIO_TO_REPORT
所以你可以使用 Lukasz 展示的 DIV0 方法,我在下面对其进行了解构,但是 RETIO_TO_REPORT 进行了提升,我以多种形式展示它以表明它对 deconstructed/composed 很满意版本:
With data AS (
SELECT *
FROM VALUES
('2021-01-23 13:45:00','2021-01-23 13:45:00'),
('2021-01-23 13:45:00','2021-01-23 14:40:00'),
('2021-01-23 13:45:00','2021-01-23 14:44:00'),
('2021-01-23 13:59:59','2021-01-23 14:00:01'),
('2021-01-23 13:45:00','2021-01-24 13:45:00'),
('2021-01-23 13:45:00','2021-01-27 13:45:00'),
('2021-01-24 13:45:00','2021-01-23 13:45:00')
v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT)
), cat_data AS (
SELECT
ASSET_CHECKED_IN_AT,
STATUS_CHANGED_TO_COMPLETE_HERE_AT,
TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
CASE
WHEN hour_diff <0 then '-ve HOURS'
WHEN hour_diff <24 then '0-24 HOURS'
WHEN hour_diff <48 then '24-48 HOURS'
WHEN hour_diff <72 then '48-72 HOURS'
WHEN hour_diff <96 then '72-96 HOURS'
WHEN hour_diff <120 then '96-120 HOURS'
else '>=5 DAYS'
End as Case_Age_Category
FROM data
)
SELECT Case_Age_Category
,COUNT(1) as cat_count
,SUM(cat_count) OVER() as total_count_a
,DIV0(cat_count, total_count_a) as percentage
,RATIO_TO_REPORT(cat_count) over () as percentage_2
,RATIO_TO_REPORT(COUNT(1)) over () as percentage_3
,RATIO_TO_REPORT(COUNT(Case_Age_Category)) over () as percentage_4
FROM cat_data
GROUP BY 1;
给出:
CASE_AGE_CATEGORY | CAT_COUNT | TOTAL_COUNT_A | PERCENTAGE | PERCENTAGE_2 | PERCENTAGE_3 | PERCENTAGE_4 |
---|---|---|---|---|---|---|
0-24 HOURS | 4 | 7 | 0.571428 | 0.571429 | 0.571429 | 0.571429 |
24-48 HOURS | 1 | 7 | 0.142857 | 0.142857 | 0.142857 | 0.142857 |
-ve HOURS | 1 | 7 | 0.142857 | 0.142857 | 0.142857 | 0.142857 |
96-120 HOURS | 1 | 7 | 0.142857 | 0.142857 | 0.142857 | 0.142857 |