MariaDB 中的百分位数

Percentiles in MariaDB

我试图在 MariaDB 10.4.11 中找到第 25 个和第 75 个百分位数,根据 https://mariadb.com/kb/en/percentile_cont/ 我相信下面的代码是正确的方法,但是 returns每次计算的结果相同?

select name, 
    percentile_cont(0.25) within group (order by sell_price) over (partition by name) as percentile_25,
    percentile_cont(0.5) within group (order by sell_price) over (partition by name) as median,
    percentile_cont(0.75) within group (order by sell_price) over (partition by name) as percentile_75
from commodity
group by name;

示例数据;

market_id    name        sell_price 
3223191296   beer       175
128081144    beer       175
3225577472   beer       338
3228907520   beer       409
128666762    beer       600
3223210496   beer       646
3543674368   beer       647
3543674368   beer       647
3227117312   beer       690
3224189696   beer       704
3227711744   beer       709
128754255    beer       756
3223191296   coffee     1286
128081144    coffee     1286
3228907520   coffee     1601
3225577472   coffee     1694
128666762    coffee     1703
128754255    coffee     1842
3223210496   coffee     1892
3227117312   coffee     1928
3227711744   coffee     1956
3224189696   coffee     1965
3543674368   coffee     2245
3223891456   coffee     2733
3223891456   beer       4431

预期结果(虚构);

name        percentile_25   median  percentile_75
beer        338             646     704
coffee      1694            1892    2245

PERCENTILE_CONT 函数是一个 window 函数,因此应用于整个结果集。您可以通过按名称聚合并取每个表达式的最大值来获得所需的输出:

SELECT
    name, 
    MAX(percentile_25) AS percentile_25, 
    MAX(median) AS median, 
    MAX(percentile_75) AS percentile_75
FROM
(
    SELECT
        name,
        PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY sell_price) OVER (PARTITION BY name) AS percentile_25,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sell_price) OVER (PARTITION BY name) AS median,
        PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY sell_price) OVER (PARTITION BY name) AS percentile_75
    FROM commodity
) t
GROUP BY name;

percentile_cont() 是一个 window 函数而不是聚合函数。

一个简单的解决方案是使用 select distinct 而不是 group by:

select distinct name, 
       percentile_cont(0.25) within group (order by sell_price) over (partition by name) as percentile_25,
       percentile_cont(0.50) within group (order by sell_price) over (partition by name) as median,
       percentile_cont(0.75) within group (order by sell_price) over (partition by name) as percentile_75
from commodity;