如何解决嵌套聚合函数错误?

How to resolve nested aggregate function error?

我使用 case 函数对项目的聚合计数进行存储,并希望将所有计数的总和划分为每个存储桶的值(希望将每个存储桶显示为总数的百分比)。但是,我收到无法嵌套聚合函数的错误,我理解这一点,但需要一些帮助来找到替代解决方案来实现我的目标。

错误:

Aggregate functions cannot be nested: [COUNT("values".CASE_AGE_CATEGORY)] nested in [SUM(COUNT("values".CASE_AGE_CATEGORY))]

代码:

SELECT Case_Age_Category, COUNT(Case_Age_Category)/sum(count(Case_Age_Category)) as Volume
FROM
(
SELECT DISTINCT(c.CASE_ID),c.CLOSED_AT,
    CASE
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >0
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <24 then '0-24 HOURS'
    
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >24
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <48 then '24-48 HOURS'
    
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >48
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <72 then '48-72 HOURS'
        
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >72
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <96 then '72-96 HOURS'
        
        WHEN TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) >96
        AND TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) <120 then '96-120 HOURS'
        
        else '>5 DAYS'
    End as Case_Age_Category
FROM TEST_DB.STAGING.DW_DECISIV_CASES c inner join DB.Seed.DEALER_MAPPING d on c.DEALER_ID = d.DECISIVDEALERID
WHERE d.DIVISION = 'K' 
    and RO_NUMBER is not NULL 
    and (d.DEALERCATEGORY ILIKE 'DEALER' OR d.DEALERCATEGORY ILIKE 'RTC') 
    and d.DEALERUSAGE ILIKE 'PRODUCTION' 
    and d.OWNERGROUPCODE !='S040'    
)

WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC

当前输出截图:

looking to have each bucket shown as a % of the total

将 COUNT(...) 与窗口化 SUM() OVER() 相结合以获得所有组的总数:

SELECT Case_Age_Category, 
      DIV0(COUNT(Case_Age_Category), SUM(COUNT(Case_Age_Category)) OVER()) as Volume
FROM
(
  -- ...
) sub
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC

db<>fiddle demo

您的时间分桶有一些问题。

  • 您不会捕获负时数或零时数,因此这些将被归类为“>5”
  • 你在任何地方都不符合 24、48、72、96 小时
  • 2 秒的时差,TIMEDIFF 标记为 1 小时可能不是问题
SELECT 
    ASSET_CHECKED_IN_AT, 
    STATUS_CHANGED_TO_COMPLETE_HERE_AT,
    TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
    CASE
        WHEN hour_diff >0 AND hour_diff <24 then '0-24 HOURS'
        WHEN hour_diff >24 AND hour_diff <48 then '24-48 HOURS'
        WHEN hour_diff >48 AND hour_diff <72 then '48-72 HOURS'
        WHEN hour_diff >72 AND hour_diff <96 then '72-96 HOURS'
        WHEN hour_diff >96 AND hour_diff <120 then '96-120 HOURS'
        else '>5 DAYS'
    End as Case_Age_Category,
    (date_part(epoch_second, STATUS_CHANGED_TO_COMPLETE_HERE_AT::timestamp_ntz)-date_part(epoch_second, ASSET_CHECKED_IN_AT::timestamp_ntz))/3600 as hour_diff_2
FROM VALUES 
    ('2021-01-23 13:45:00','2021-01-23 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:40:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:44:00'),
    ('2021-01-23 13:59:59','2021-01-23 14:00:01'),
    ('2021-01-23 13:45:00','2021-01-24 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-27 13:45:00'),
    ('2021-01-24 13:45:00','2021-01-23 13:45:00')
    v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);

给出:

ASSET_CHECKED_IN_AT STATUS_CHANGED_TO_COMPLETE_HERE_AT HOUR_DIFF CASE_AGE_CATEGORY HOUR_DIFF_2
2021-01-23 13:45:00 2021-01-23 13:45:00 0 >5 DAYS 0
2021-01-23 13:45:00 2021-01-23 14:40:00 1 0-24 HOURS 0.916667
2021-01-23 13:45:00 2021-01-23 14:44:00 1 0-24 HOURS 0.983333
2021-01-23 13:59:59 2021-01-23 14:00:01 1 0-24 HOURS 0.000556
2021-01-23 13:45:00 2021-01-24 13:45:00 24 >5 DAYS 24
2021-01-23 13:45:00 2021-01-27 13:45:00 96 >5 DAYS 96
2021-01-24 13:45:00 2021-01-23 13:45:00 -24 >5 DAYS -24

接下来在 CASE 中评估 WHEN 子句的顺序是呈现的顺序,因此您可以使用它来减少一半的评估,并且如果您将时差作为顶级列,您 SQL 更具可读性。

鉴于这将进入 SQL 的另一层,额外的列是有用的,是使 SQL 更具可读性的一种相当不错的方法。

SELECT 
    ASSET_CHECKED_IN_AT, 
    STATUS_CHANGED_TO_COMPLETE_HERE_AT,
    TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
    CASE
        WHEN hour_diff <0 then '-ve HOURS'
        WHEN hour_diff <24 then '0-24 HOURS'
        WHEN hour_diff <48 then '24-48 HOURS'
        WHEN hour_diff <72 then '48-72 HOURS'
        WHEN hour_diff <96 then '72-96 HOURS'
        WHEN  hour_diff <120 then '96-120 HOURS'
        else '>=5 DAYS'
    End as Case_Age_Category
FROM VALUES 
    ('2021-01-23 13:45:00','2021-01-23 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:40:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:44:00'),
    ('2021-01-23 13:59:59','2021-01-23 14:00:01'),
    ('2021-01-23 13:45:00','2021-01-24 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-27 13:45:00'),
    ('2021-01-24 13:45:00','2021-01-23 13:45:00')
    v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT);
ASSET_CHECKED_IN_AT STATUS_CHANGED_TO_COMPLETE_HERE_AT HOUR_DIFF CASE_AGE_CATEGORY
2021-01-23 13:45:00 2021-01-23 13:45:00 0 0-24 HOURS
2021-01-23 13:45:00 2021-01-23 14:40:00 1 0-24 HOURS
2021-01-23 13:45:00 2021-01-23 14:44:00 1 0-24 HOURS
2021-01-23 13:59:59 2021-01-23 14:00:01 1 0-24 HOURS
2021-01-23 13:45:00 2021-01-24 13:45:00 24 24-48 HOURS
2021-01-23 13:45:00 2021-01-27 13:45:00 96 96-120 HOURS
2021-01-24 13:45:00 2021-01-23 13:45:00 -24 -ve HOURS

现在介绍另一种计算百分比的方法,RATIO_TO_REPORT

所以你可以使用 Lukasz 展示的 DIV0 方法,我在下面对其进行了解构,但是 RETIO_TO_REPORT 进行了提升,我以多种形式展示它以表明它对 deconstructed/composed 很满意版本:

With data AS (
    SELECT * 
    FROM VALUES 
    ('2021-01-23 13:45:00','2021-01-23 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:40:00'),
    ('2021-01-23 13:45:00','2021-01-23 14:44:00'),
    ('2021-01-23 13:59:59','2021-01-23 14:00:01'),
    ('2021-01-23 13:45:00','2021-01-24 13:45:00'),
    ('2021-01-23 13:45:00','2021-01-27 13:45:00'),
    ('2021-01-24 13:45:00','2021-01-23 13:45:00')
    v(ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT)
), cat_data AS (
    SELECT 
        ASSET_CHECKED_IN_AT, 
        STATUS_CHANGED_TO_COMPLETE_HERE_AT,
        TIMEDIFF('HOUR', ASSET_CHECKED_IN_AT, STATUS_CHANGED_TO_COMPLETE_HERE_AT) as hour_diff,
        CASE
            WHEN hour_diff <0 then '-ve HOURS'
            WHEN hour_diff <24 then '0-24 HOURS'
            WHEN hour_diff <48 then '24-48 HOURS'
            WHEN hour_diff <72 then '48-72 HOURS'
            WHEN hour_diff <96 then '72-96 HOURS'
            WHEN  hour_diff <120 then '96-120 HOURS'
            else '>=5 DAYS'
        End as Case_Age_Category
    FROM data
)
SELECT Case_Age_Category
    ,COUNT(1) as cat_count
    ,SUM(cat_count) OVER() as total_count_a
    ,DIV0(cat_count, total_count_a) as percentage
    ,RATIO_TO_REPORT(cat_count) over () as percentage_2
    ,RATIO_TO_REPORT(COUNT(1)) over () as percentage_3
    ,RATIO_TO_REPORT(COUNT(Case_Age_Category)) over () as percentage_4
FROM cat_data
GROUP BY 1;

给出:

CASE_AGE_CATEGORY CAT_COUNT TOTAL_COUNT_A PERCENTAGE PERCENTAGE_2 PERCENTAGE_3 PERCENTAGE_4
0-24 HOURS 4 7 0.571428 0.571429 0.571429 0.571429
24-48 HOURS 1 7 0.142857 0.142857 0.142857 0.142857
-ve HOURS 1 7 0.142857 0.142857 0.142857 0.142857
96-120 HOURS 1 7 0.142857 0.142857 0.142857 0.142857