PERCENTILE_CONT() returns 与输入参数无关的相同值
PERCENTILE_CONT() returns same value regardless of input parameter
我想获得 table
的第 5、50、95 个百分位数
SELECT col1, col2, col3, AVG(col4), STD(col4),
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
GROUP BY col1, col2, col3
LIMIT 100
我最终得到的结果是 5th_percentile == 50th_percentile == 95th_percentile
AVG(col4) STD(col4) 5th_percentile 50th_percentile 95th_percentile
300.000000 0.000000 300.000000 300.000000 300.000000
67.076600 16.968851 82.031792 82.031792 82.031792
66.166136 11.452172 78.348846 78.348846 78.348846
544.262809 68.269014 605.797302 605.797302 605.797302
22.523138 1.820358 24.000000 24.000000 24.000000
怎么回事?
编辑:数据库是 MemSQL
PERCENTILE_CONT()
-- 至少在某些数据库中 -- 可以是聚合函数或 window 函数。
我认为正在发生的事情是 在 聚合之后计算值 - 我不确定为什么。老实说,我预计代码会出现语法错误,因为 col4
没有聚合。换句话说,(ORDER BY MAX(col4))
应该有效,但 (ORDER BY col4)
无效,因为百分位数是在 聚合后 计算的。
但试试不带 OVER
子句:
SELECT col1, col2, col3, AVG(col4), STD(col4),
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) as 95th_percentile
FROM table
GROUP BY col1, col2, col3
LIMIT 100;
编辑:
您的数据库似乎不支持 PERCENTILE_CONT()
作为聚合函数。不考虑口味。大部分都是。
解决方法是SELECT DISTINCT
:
SELECT DISTINCT col1, col2, col3,
AVG(col4) OVER (PARTITION BY col1, col2, col3),
STD(col4) OVER (PARTITION BY col1, col2, col3),
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
LIMIT 100;
或者使用子查询。
WITH a AS (
SELECT col1, col2, col3,
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
)
SELECT DISTINCT col1, col2, col3, 5th_percentile, 50th_percentile, 95th_percentile
FROM a
LIMIT 100
这有效,看起来你不能用 percentile_cont
进行分组
Window 函数在 GROUP BY 子句之后运行。 GROUP BY 每组生成一行,这就是为什么 PERCENTILE_CONT window 函数都 return 相同的值。
您想先计算 window 函数,然后再计算 GROUP BY。您可以通过将 window 函数放在内部子 select 中并将 GROUP BY 放在外部 select.
中来实现
这是来自 postgres 的文档,它解释了 window 函数如何与分组依据相关(这是标准的 ANSI SQL,MemSQL 做同样的事情):
https://www.postgresql.org/docs/current/static/tutorial-window.html
The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
请注意,在 MemSQL 中,如果您使用未分组或聚合的列(例如查询中的 col4),您会从组中的行中获得任意值,即它表现得像 ANY_VALUE 聚合。在 MemSQL 的未来版本中,此查询将改为 return 错误,以帮助您避免编写具有此类意外行为的查询。
我想获得 table
的第 5、50、95 个百分位数SELECT col1, col2, col3, AVG(col4), STD(col4),
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
GROUP BY col1, col2, col3
LIMIT 100
我最终得到的结果是 5th_percentile == 50th_percentile == 95th_percentile
AVG(col4) STD(col4) 5th_percentile 50th_percentile 95th_percentile
300.000000 0.000000 300.000000 300.000000 300.000000
67.076600 16.968851 82.031792 82.031792 82.031792
66.166136 11.452172 78.348846 78.348846 78.348846
544.262809 68.269014 605.797302 605.797302 605.797302
22.523138 1.820358 24.000000 24.000000 24.000000
怎么回事?
编辑:数据库是 MemSQL
PERCENTILE_CONT()
-- 至少在某些数据库中 -- 可以是聚合函数或 window 函数。
我认为正在发生的事情是 在 聚合之后计算值 - 我不确定为什么。老实说,我预计代码会出现语法错误,因为 col4
没有聚合。换句话说,(ORDER BY MAX(col4))
应该有效,但 (ORDER BY col4)
无效,因为百分位数是在 聚合后 计算的。
但试试不带 OVER
子句:
SELECT col1, col2, col3, AVG(col4), STD(col4),
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) as 95th_percentile
FROM table
GROUP BY col1, col2, col3
LIMIT 100;
编辑:
您的数据库似乎不支持 PERCENTILE_CONT()
作为聚合函数。不考虑口味。大部分都是。
解决方法是SELECT DISTINCT
:
SELECT DISTINCT col1, col2, col3,
AVG(col4) OVER (PARTITION BY col1, col2, col3),
STD(col4) OVER (PARTITION BY col1, col2, col3),
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4) OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
LIMIT 100;
或者使用子查询。
WITH a AS (
SELECT col1, col2, col3,
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 5th_percentile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 50th_percentile,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY col4)
OVER (PARTITION BY col1, col2, col3) as 95th_percentile
FROM table
)
SELECT DISTINCT col1, col2, col3, 5th_percentile, 50th_percentile, 95th_percentile
FROM a
LIMIT 100
这有效,看起来你不能用 percentile_cont
进行分组Window 函数在 GROUP BY 子句之后运行。 GROUP BY 每组生成一行,这就是为什么 PERCENTILE_CONT window 函数都 return 相同的值。
您想先计算 window 函数,然后再计算 GROUP BY。您可以通过将 window 函数放在内部子 select 中并将 GROUP BY 放在外部 select.
中来实现这是来自 postgres 的文档,它解释了 window 函数如何与分组依据相关(这是标准的 ANSI SQL,MemSQL 做同样的事情):
https://www.postgresql.org/docs/current/static/tutorial-window.html
The rows considered by a window function are those of the "virtual table" produced by the query's FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
请注意,在 MemSQL 中,如果您使用未分组或聚合的列(例如查询中的 col4),您会从组中的行中获得任意值,即它表现得像 ANY_VALUE 聚合。在 MemSQL 的未来版本中,此查询将改为 return 错误,以帮助您避免编写具有此类意外行为的查询。