SQL 查询中的四分位数
Quartiles in SQL query
我有一个非常简单的table这样的:
CREATE TABLE IF NOT EXISTS LuxLog (
Sensor TINYINT,
Lux INT,
PRIMARY KEY(Sensor)
)
它包含来自不同传感器的数千条日志。
我想为所有传感器配备 Q1 和 Q3。
我可以对每个数据进行一次查询,但对我来说最好对所有传感器进行一次查询(从一次查询中获取 Q1 和 Q3)
我认为这将是一个相当简单的操作,因为四分位数被广泛使用并且是频率计算中的主要统计变量之一。事实上,我发现了大量过于复杂的解决方案,而我希望找到一些简洁明了的解决方案。
谁能给我提示?
编辑:这是我在网上找到的一段代码,但它对我不起作用:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT( -- 1) make a sorted list of values
Lux
ORDER BY Lux
SEPARATOR ','
)
, ',' -- 2) cut at the comma
, 75/100 * COUNT(*) -- at the position beyond the 90% portion
)
, ',' -- 3) cut at the comma
, -1 -- right after the desired list entry
) AS `75th Percentile`
FROM LuxLog
WHERE Sensor=12
AND Lux<>0
我得到 1 作为 return 值,而它应该是一个可以被 10 整除的数字 (10,20,30.....1000)
好吧,使用 NTILE 非常简单,但它是一个 Postgres 函数。你基本上只是做这样的事情:
SELECT value_you_are_NTILING,
NTILE(4) OVER (ORDER BY value_you_are_NTILING DESC) AS tiles
FROM
(SELECT math_that_gives_you_the_value_you_are_NTILING_here AS value_you_are_NTILING FROM tablename);
这是我在 SQLFiddle 上为您制作的一个简单示例:http://sqlfiddle.com/#!15/7f05a/1
在 MySQL 中,您将使用 RANK...这是用于此的 SQLFiddle:http://www.sqlfiddle.com/#!2/d5587/1(这来自下面链接的问题)
MySQL RANK() 的使用来自此处回答的 Whosebug:Rank function in MySQL
寻找 Salman A 的答案。
应该这样做:
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
完整示例如下:
use example;
drop table if exists luxlog;
CREATE TABLE LuxLog (
Sensor TINYINT,
Lux INT,
position int,
PRIMARY KEY(Position)
);
insert into luxlog values (0, 1, 10);
insert into luxlog values (0, 2, 20);
insert into luxlog values (0, 3, 30);
insert into luxlog values (0, 4, 40);
insert into luxlog values (0, 5, 50);
insert into luxlog values (0, 6, 60);
insert into luxlog values (0, 7, 70);
insert into luxlog values (0, 8, 80);
select count(*)*.25 from luxlog;
select count(*)*.50 from luxlog;
select
ll.*,
a.position,
b.position,
if(
a.position is not null, 1,
if (b.position is not null, 2, 0)
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux >= (select count(*)*0.00 from luxlog) and a.lux < (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux >= (select count(*)*0.25 from luxlog) and b.lux < (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux >= (select count(*)*0.50 from luxlog) and c.lux < (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux >= (select count(*)*0.75 from luxlog) and d.lux < (select count(*)*1.00 from luxlog)
;
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
或者您可以这样使用排名:
select
ll.*,
@curRank := @curRank + 1 as rank,
if (@curRank <= (select count(*)*0.25 from luxlog), 1,
if (@curRank <= (select count(*)*0.50 from luxlog), 2,
if (@curRank <= (select count(*)*0.75 from luxlog), 3, 4))
) as quartile
from
luxlog ll,
(SELECT @curRank := 0) r
;
这将只为每个四分位数提供一条记录:
select
x.quartile, group_concat(position)
from (
select
ll.*,
@curRank := @curRank + 1 as rank,
if (@curRank > 0 and @curRank <= (select count(*)*0.25 from luxlog), 1,
if (@curRank > 0 and @curRank <= (select count(*)*0.50 from luxlog), 2,
if (@curRank > 0 and @curRank <= (select count(*)*0.75 from luxlog), 3, 4))
) as quartile
from
luxlog ll,
(SELECT @curRank := 0) r
) x
group by quartile
+ ------------- + --------------------------- +
| quartile | group_concat(position) |
+ ------------- + --------------------------- +
| 1 | 10,20 |
| 2 | 30,40 |
| 3 | 50,60 |
| 4 | 70,80 |
+ ------------- + --------------------------- +
4 rows
编辑:
删除后的 sqlFiddle 示例 (http://sqlfiddle.com/#!9/a14a4/17) 如下所示
/*SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;*/
参见 SqlFiddle:http://sqlfiddle.com/#!9/accca6/2/6
注意:对于 sqlfiddle,我生成了 100 行,1 到 100 之间的每个整数都有一行,但它是随机顺序(在 excel 中完成)。
这是代码:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
编辑:
SET @current_sensor := 101;
SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
基本推理如下:
对于四分位数 1,我们希望从顶部获得 25%,因此我们想知道有多少行,即:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
现在我们知道了行数,我们想知道它的 25% 是多少,就是这一行:
SET @quartile := (ROUND(@number_of_rows*0.25));
然后要找到一个四分位数,我们要按 Lux 对 LuxLog table 进行排序,然后获取行号“@quartile”,为此我们将 OFFSET 设置为 @quartile 表示我们想从行号@quartile 开始我们的 select 并且我们说 limit 1 表示我们只想检索一行。那是:
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
我们对另一个四分位数做(几乎)相同的事情,但我们不是从顶部开始(从高值到低值),而是从底部开始(它解释了 ASC)。
但是现在我们只是将字符串存储在变量@sql_q1 和@sql_q3 中,所以连接它们,我们合并查询的结果,我们准备查询并执行它.
这是我提出的用于计算四分位数的查询;它在 ~0.04s w/ ~5000 table 行中运行。我包含了 min/max 值,因为我最终使用这些数据来构建四个四分位数范围:
SELECT percentile_table.percentile, avg(ColumnName) AS percentile_values
FROM
(SELECT @rownum := @rownum + 1 AS `row_number`,
d.ColumnName
FROM PercentileTestTable d,
(SELECT @rownum := 0) r
WHERE ColumnName IS NOT NULL
ORDER BY d.ColumnName
) AS t1,
(SELECT count(*) AS total_rows
FROM PercentileTestTable d
WHERE ColumnName IS NOT NULL
) AS t2,
(SELECT 0 AS percentile
UNION ALL
SELECT 0.25
UNION ALL
SELECT 0.5
UNION ALL
SELECT 0.75
UNION ALL
SELECT 1
) AS percentile_table
WHERE
(percentile_table.percentile != 0
AND percentile_table.percentile != 1
AND t1.row_number IN
(
floor(( total_rows + 1 ) * percentile_table.percentile),
floor(( total_rows + 2 ) * percentile_table.percentile)
)
) OR (
percentile_table.percentile = 0
AND t1.row_number = 1
) OR (
percentile_table.percentile = 1
AND t1.row_number = total_rows
)
GROUP BY percentile_table.percentile;
Fiddle 这里:http://sqlfiddle.com/#!9/58c0e2/1
肯定存在性能问题;如果有人对如何改进这一点有任何反馈,我会很高兴。
示例数据列表:
3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
示例查询输出:
| percentile | percentile_values |
|------------|-------------------|
| 0 | 3 |
| 0.25 | 4 |
| 0.5 | 10.5 |
| 0.75 | 15 |
| 1 | 18 |
我将此解决方案与 MYSQL 函数一起使用:
x就是你要的百分位数
array_values 您的 group_concat 值排序并由 ,
分隔
DROP FUNCTION IF EXISTS centile;
delimiter $$
CREATE FUNCTION `centile`(x Text, array_values TEXT) RETURNS text
BEGIN
Declare DIFF_RANK TEXT;
Declare RANG_FLOOR INT;
Declare COUNT INT;
Declare VALEUR_SUP TEXT;
Declare VALEUR_INF TEXT;
SET COUNT = LENGTH(array_values) - LENGTH(REPLACE(array_values, ',', '')) + 1;
SET RANG_FLOOR = FLOOR(ROUND((x) * (COUNT-1),2));
SET DIFF_RANK = ((x) * (COUNT-1)) - FLOOR(ROUND((x) * (COUNT-1),2));
SET VALEUR_SUP = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+2),',',-1) AS DECIMAL);
SET VALEUR_INF = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+1),',',-1) AS DECIMAL);
/****
https://fr.wikipedia.org/wiki/Quantile
x_j+1 + g (x_j+2 - x_j+1)
***/
RETURN Round((VALEUR_INF + (DIFF_RANK* (VALEUR_SUP-VALEUR_INF) ) ),2);
END$$
示例:
Select centile(3/4,GROUP_CONCAT(lux ORDER BY lux SEPARATOR ',')) as quartile_3
FROM LuxLog
WHERE Sensor=12 AND Lux<>0
我有一个非常简单的table这样的:
CREATE TABLE IF NOT EXISTS LuxLog (
Sensor TINYINT,
Lux INT,
PRIMARY KEY(Sensor)
)
它包含来自不同传感器的数千条日志。
我想为所有传感器配备 Q1 和 Q3。
我可以对每个数据进行一次查询,但对我来说最好对所有传感器进行一次查询(从一次查询中获取 Q1 和 Q3)
我认为这将是一个相当简单的操作,因为四分位数被广泛使用并且是频率计算中的主要统计变量之一。事实上,我发现了大量过于复杂的解决方案,而我希望找到一些简洁明了的解决方案。
谁能给我提示?
编辑:这是我在网上找到的一段代码,但它对我不起作用:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX(
GROUP_CONCAT( -- 1) make a sorted list of values
Lux
ORDER BY Lux
SEPARATOR ','
)
, ',' -- 2) cut at the comma
, 75/100 * COUNT(*) -- at the position beyond the 90% portion
)
, ',' -- 3) cut at the comma
, -1 -- right after the desired list entry
) AS `75th Percentile`
FROM LuxLog
WHERE Sensor=12
AND Lux<>0
我得到 1 作为 return 值,而它应该是一个可以被 10 整除的数字 (10,20,30.....1000)
好吧,使用 NTILE 非常简单,但它是一个 Postgres 函数。你基本上只是做这样的事情:
SELECT value_you_are_NTILING,
NTILE(4) OVER (ORDER BY value_you_are_NTILING DESC) AS tiles
FROM
(SELECT math_that_gives_you_the_value_you_are_NTILING_here AS value_you_are_NTILING FROM tablename);
这是我在 SQLFiddle 上为您制作的一个简单示例:http://sqlfiddle.com/#!15/7f05a/1
在 MySQL 中,您将使用 RANK...这是用于此的 SQLFiddle:http://www.sqlfiddle.com/#!2/d5587/1(这来自下面链接的问题)
MySQL RANK() 的使用来自此处回答的 Whosebug:Rank function in MySQL
寻找 Salman A 的答案。
应该这样做:
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
完整示例如下:
use example;
drop table if exists luxlog;
CREATE TABLE LuxLog (
Sensor TINYINT,
Lux INT,
position int,
PRIMARY KEY(Position)
);
insert into luxlog values (0, 1, 10);
insert into luxlog values (0, 2, 20);
insert into luxlog values (0, 3, 30);
insert into luxlog values (0, 4, 40);
insert into luxlog values (0, 5, 50);
insert into luxlog values (0, 6, 60);
insert into luxlog values (0, 7, 70);
insert into luxlog values (0, 8, 80);
select count(*)*.25 from luxlog;
select count(*)*.50 from luxlog;
select
ll.*,
a.position,
b.position,
if(
a.position is not null, 1,
if (b.position is not null, 2, 0)
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux >= (select count(*)*0.00 from luxlog) and a.lux < (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux >= (select count(*)*0.25 from luxlog) and b.lux < (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux >= (select count(*)*0.50 from luxlog) and c.lux < (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux >= (select count(*)*0.75 from luxlog) and d.lux < (select count(*)*1.00 from luxlog)
;
select
ll.*,
if (a.position is not null, 1,
if (b.position is not null, 2,
if (c.position is not null, 3,
if (d.position is not null, 4, 0)))
) as quartile
from
luxlog ll
left outer join luxlog a on ll.position = a.position and a.lux > (select count(*)*0.00 from luxlog) and a.lux <= (select count(*)*0.25 from luxlog)
left outer join luxlog b on ll.position = b.position and b.lux > (select count(*)*0.25 from luxlog) and b.lux <= (select count(*)*0.50 from luxlog)
left outer join luxlog c on ll.position = c.position and c.lux > (select count(*)*0.50 from luxlog) and c.lux <= (select count(*)*0.75 from luxlog)
left outer join luxlog d on ll.position = d.position and d.lux > (select count(*)*0.75 from luxlog)
;
或者您可以这样使用排名:
select
ll.*,
@curRank := @curRank + 1 as rank,
if (@curRank <= (select count(*)*0.25 from luxlog), 1,
if (@curRank <= (select count(*)*0.50 from luxlog), 2,
if (@curRank <= (select count(*)*0.75 from luxlog), 3, 4))
) as quartile
from
luxlog ll,
(SELECT @curRank := 0) r
;
这将只为每个四分位数提供一条记录:
select
x.quartile, group_concat(position)
from (
select
ll.*,
@curRank := @curRank + 1 as rank,
if (@curRank > 0 and @curRank <= (select count(*)*0.25 from luxlog), 1,
if (@curRank > 0 and @curRank <= (select count(*)*0.50 from luxlog), 2,
if (@curRank > 0 and @curRank <= (select count(*)*0.75 from luxlog), 3, 4))
) as quartile
from
luxlog ll,
(SELECT @curRank := 0) r
) x
group by quartile
+ ------------- + --------------------------- +
| quartile | group_concat(position) |
+ ------------- + --------------------------- +
| 1 | 10,20 |
| 2 | 30,40 |
| 3 | 50,60 |
| 4 | 70,80 |
+ ------------- + --------------------------- +
4 rows
编辑: 删除后的 sqlFiddle 示例 (http://sqlfiddle.com/#!9/a14a4/17) 如下所示
/*SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=101 ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;*/
参见 SqlFiddle:http://sqlfiddle.com/#!9/accca6/2/6 注意:对于 sqlfiddle,我生成了 100 行,1 到 100 之间的每个整数都有一行,但它是随机顺序(在 excel 中完成)。
这是代码:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
编辑:
SET @current_sensor := 101;
SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
基本推理如下: 对于四分位数 1,我们希望从顶部获得 25%,因此我们想知道有多少行,即:
SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
现在我们知道了行数,我们想知道它的 25% 是多少,就是这一行:
SET @quartile := (ROUND(@number_of_rows*0.25));
然后要找到一个四分位数,我们要按 Lux 对 LuxLog table 进行排序,然后获取行号“@quartile”,为此我们将 OFFSET 设置为 @quartile 表示我们想从行号@quartile 开始我们的 select 并且我们说 limit 1 表示我们只想检索一行。那是:
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
我们对另一个四分位数做(几乎)相同的事情,但我们不是从顶部开始(从高值到低值),而是从底部开始(它解释了 ASC)。
但是现在我们只是将字符串存储在变量@sql_q1 和@sql_q3 中,所以连接它们,我们合并查询的结果,我们准备查询并执行它.
这是我提出的用于计算四分位数的查询;它在 ~0.04s w/ ~5000 table 行中运行。我包含了 min/max 值,因为我最终使用这些数据来构建四个四分位数范围:
SELECT percentile_table.percentile, avg(ColumnName) AS percentile_values
FROM
(SELECT @rownum := @rownum + 1 AS `row_number`,
d.ColumnName
FROM PercentileTestTable d,
(SELECT @rownum := 0) r
WHERE ColumnName IS NOT NULL
ORDER BY d.ColumnName
) AS t1,
(SELECT count(*) AS total_rows
FROM PercentileTestTable d
WHERE ColumnName IS NOT NULL
) AS t2,
(SELECT 0 AS percentile
UNION ALL
SELECT 0.25
UNION ALL
SELECT 0.5
UNION ALL
SELECT 0.75
UNION ALL
SELECT 1
) AS percentile_table
WHERE
(percentile_table.percentile != 0
AND percentile_table.percentile != 1
AND t1.row_number IN
(
floor(( total_rows + 1 ) * percentile_table.percentile),
floor(( total_rows + 2 ) * percentile_table.percentile)
)
) OR (
percentile_table.percentile = 0
AND t1.row_number = 1
) OR (
percentile_table.percentile = 1
AND t1.row_number = total_rows
)
GROUP BY percentile_table.percentile;
Fiddle 这里:http://sqlfiddle.com/#!9/58c0e2/1
肯定存在性能问题;如果有人对如何改进这一点有任何反馈,我会很高兴。
示例数据列表:
3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
示例查询输出:
| percentile | percentile_values |
|------------|-------------------|
| 0 | 3 |
| 0.25 | 4 |
| 0.5 | 10.5 |
| 0.75 | 15 |
| 1 | 18 |
我将此解决方案与 MYSQL 函数一起使用:
x就是你要的百分位数
array_values 您的 group_concat 值排序并由 ,
分隔DROP FUNCTION IF EXISTS centile;
delimiter $$
CREATE FUNCTION `centile`(x Text, array_values TEXT) RETURNS text
BEGIN
Declare DIFF_RANK TEXT;
Declare RANG_FLOOR INT;
Declare COUNT INT;
Declare VALEUR_SUP TEXT;
Declare VALEUR_INF TEXT;
SET COUNT = LENGTH(array_values) - LENGTH(REPLACE(array_values, ',', '')) + 1;
SET RANG_FLOOR = FLOOR(ROUND((x) * (COUNT-1),2));
SET DIFF_RANK = ((x) * (COUNT-1)) - FLOOR(ROUND((x) * (COUNT-1),2));
SET VALEUR_SUP = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+2),',',-1) AS DECIMAL);
SET VALEUR_INF = CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(array_values,',', RANG_FLOOR+1),',',-1) AS DECIMAL);
/****
https://fr.wikipedia.org/wiki/Quantile
x_j+1 + g (x_j+2 - x_j+1)
***/
RETURN Round((VALEUR_INF + (DIFF_RANK* (VALEUR_SUP-VALEUR_INF) ) ),2);
END$$
示例:
Select centile(3/4,GROUP_CONCAT(lux ORDER BY lux SEPARATOR ',')) as quartile_3
FROM LuxLog
WHERE Sensor=12 AND Lux<>0