如何在单个 Teradata 查询中输出不同的第 25、50、75 个百分位数?
How to output different 25th, 50th, 75th percentiles in single Teradata query?
几个小时前,我在类似的事情上陷入困境,并制定了一个不太混乱的代码,用于在单个 Teradata 查询中输出第 25、50、75 个百分位数。可以进一步扩展以生成“5 点摘要”。根据您的人口估计,对于最小和最大变化静态值。
某处有人要求一种优雅的方法。分享我的。
代码如下:
SELECT MAX(PER_MIN) AS PER_MIN,
MAX(PER_25) AS PER_25,
MAX(PER_50) AS PER_50,
MAX(PER_75) AS PER_75,
MAX(PER_MAX) AS PER_MAX
FROM (SELECT CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.01 AS INT) THEN DURATION_MACRO_CURR END AS PER_MIN,
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.25 AS INT) THEN DURATION_MACRO_CURR END AS PER_25,
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.50 AS INT) THEN DURATION_MACRO_CURR END AS PER_50
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.75 AS INT) THEN DURATION_MACRO_CURR END AS PER_75
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.99 AS INT) THEN DURATION_MACRO_CURR END AS PER_MAX
FROM PROD_EXP_DL_CVM.PROD_CVM
WHERE PW_END_DATE = '2016-10-18'
) BASE
这是所需的输出:
我会使用条件聚合来做到这一点:
select min(DURATION_MACRO_CURR) as min_val,
min(case when seqnum / 0.25 >= cnt then DURATION_MACRO_CURR end) as 25_percentile,
min(case when seqnum / 0.50 >= cnt then DURATION_MACRO_CURR end) as 50_percentile,
min(case when seqnum / 0.75 >= cnt then DURATION_MACRO_CURR end) as 75_percentile,
max(DURATION_MACRO_CURR) as max_val
from (select pc.*,
row_number() over (order by DURATION_MACRO_CURR) as seqnum,
count(*) over () as cnt
from PROD_EXP_DL_CVM.PROD_CVM pc
where pc.PW_END_DATE = '2016-10-18'
) pc;
几个小时前,我在类似的事情上陷入困境,并制定了一个不太混乱的代码,用于在单个 Teradata 查询中输出第 25、50、75 个百分位数。可以进一步扩展以生成“5 点摘要”。根据您的人口估计,对于最小和最大变化静态值。
某处有人要求一种优雅的方法。分享我的。
代码如下:
SELECT MAX(PER_MIN) AS PER_MIN,
MAX(PER_25) AS PER_25,
MAX(PER_50) AS PER_50,
MAX(PER_75) AS PER_75,
MAX(PER_MAX) AS PER_MAX
FROM (SELECT CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.01 AS INT) THEN DURATION_MACRO_CURR END AS PER_MIN,
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.25 AS INT) THEN DURATION_MACRO_CURR END AS PER_25,
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.50 AS INT) THEN DURATION_MACRO_CURR END AS PER_50
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.75 AS INT) THEN DURATION_MACRO_CURR END AS PER_75
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.99 AS INT) THEN DURATION_MACRO_CURR END AS PER_MAX
FROM PROD_EXP_DL_CVM.PROD_CVM
WHERE PW_END_DATE = '2016-10-18'
) BASE
这是所需的输出:
我会使用条件聚合来做到这一点:
select min(DURATION_MACRO_CURR) as min_val,
min(case when seqnum / 0.25 >= cnt then DURATION_MACRO_CURR end) as 25_percentile,
min(case when seqnum / 0.50 >= cnt then DURATION_MACRO_CURR end) as 50_percentile,
min(case when seqnum / 0.75 >= cnt then DURATION_MACRO_CURR end) as 75_percentile,
max(DURATION_MACRO_CURR) as max_val
from (select pc.*,
row_number() over (order by DURATION_MACRO_CURR) as seqnum,
count(*) over () as cnt
from PROD_EXP_DL_CVM.PROD_CVM pc
where pc.PW_END_DATE = '2016-10-18'
) pc;