如何使用 PERCENTILE_CONT 和 GROUP BY id 计算单位价格中值
How to Calculate Median Price Per Unit Using PERCENTILE_CONT and GROUP BY id
我正在使用 postgres 9.5 并尝试计算 中位数 和 平均 价格单元 与 GROUP BY
id。这是 DBFIDDLE
中的查询
这是数据
id | price | units
-----+-------+--------
1 | 100 | 15
1 | 90 | 10
1 | 50 | 8
1 | 40 | 8
1 | 30 | 7
2 | 110 | 22
2 | 60 | 8
2 | 50 | 11
使用 percentile_cont
这是我的查询:
SELECT id,
ceil(avg(price)) as avg_price,
percentile_cont(0.5) within group (order by price) as median_price,
ceil( sum (price) / sum (units) ) AS avg_pp_unit,
ceil( percentile_cont(0.5) within group (order by price) /
percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id
这个查询returns:
id| avg_price | median_price | avg_pp_unit | median_pp_unit
--+-----------+--------------+--------------+---------------
1 | 62 | 50 | 6 | 7
2 | 74 | 60 | 5 | 5
我很确定平均计算是正确的。这是计算 单位价格中值 的正确方法吗?
这 post 表明这是正确的(尽管性能很差)但我很好奇中位数计算中的除法是否会扭曲结果。
Calculating median with PERCENTILE_CONT and grouping
The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the "middle" value.
https://en.wikipedia.org/wiki/Median
所以你的中位数价格是 55,中位数单位是 9
Sort by price Sort by units
id | price | units | | id | price | units
-------|-----------|--------| |-------|---------|----------
1 | 30 | 7 | | 1 | 30 | 7
1 | 40 | 8 | | 1 | 40 | 8
1 | 50 | 8 | | 1 | 50 | 8
>>> 2 | 50 | 11 | | 2 | 60 | 8 <<<<
>>> 2 | 60 | 8 | | 1 | 90 | 10 <<<<
1 | 90 | 10 | | 2 | 50 | 11
1 | 100 | 15 | | 1 | 100 | 15
2 | 110 | 22 | | 2 | 110 | 22
| | | | | |
(50+60)/2 (8+10)/2
55 9
我不确定你打算做什么 "median price per unit":
CREATE TABLE t(
id INTEGER NOT NULL
,price INTEGER NOT NULL
,units INTEGER NOT NULL
);
INSERT INTO t(id,price,units) VALUES (1,30,7);
INSERT INTO t(id,price,units) VALUES (1,40,8);
INSERT INTO t(id,price,units) VALUES (1,50,8);
INSERT INTO t(id,price,units) VALUES (2,50,11);
INSERT INTO t(id,price,units) VALUES (2,60,8);
INSERT INTO t(id,price,units) VALUES (1,90,10);
INSERT INTO t(id,price,units) VALUES (1,100,15);
INSERT INTO t(id,price,units) VALUES (2,110,22);
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY price) med_price
, percentile_cont(0.5) WITHIN GROUP (ORDER BY units) med_units
FROM
t;
| med_price | med_units
----|-----------|-----------
1 | 55 | 9
如果 "price" 列表示 "unit price",那么您不需要将 55 除以 9,但如果 "price" 是 "order total",则您需要除以单位:55/9 = 6.11
我正在使用 postgres 9.5 并尝试计算 中位数 和 平均 价格单元 与 GROUP BY
id。这是 DBFIDDLE
这是数据
id | price | units
-----+-------+--------
1 | 100 | 15
1 | 90 | 10
1 | 50 | 8
1 | 40 | 8
1 | 30 | 7
2 | 110 | 22
2 | 60 | 8
2 | 50 | 11
使用 percentile_cont
这是我的查询:
SELECT id,
ceil(avg(price)) as avg_price,
percentile_cont(0.5) within group (order by price) as median_price,
ceil( sum (price) / sum (units) ) AS avg_pp_unit,
ceil( percentile_cont(0.5) within group (order by price) /
percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id
这个查询returns:
id| avg_price | median_price | avg_pp_unit | median_pp_unit
--+-----------+--------------+--------------+---------------
1 | 62 | 50 | 6 | 7
2 | 74 | 60 | 5 | 5
我很确定平均计算是正确的。这是计算 单位价格中值 的正确方法吗?
这 post 表明这是正确的(尽管性能很差)但我很好奇中位数计算中的除法是否会扭曲结果。
Calculating median with PERCENTILE_CONT and grouping
The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the "middle" value. https://en.wikipedia.org/wiki/Median
所以你的中位数价格是 55,中位数单位是 9
Sort by price Sort by units
id | price | units | | id | price | units
-------|-----------|--------| |-------|---------|----------
1 | 30 | 7 | | 1 | 30 | 7
1 | 40 | 8 | | 1 | 40 | 8
1 | 50 | 8 | | 1 | 50 | 8
>>> 2 | 50 | 11 | | 2 | 60 | 8 <<<<
>>> 2 | 60 | 8 | | 1 | 90 | 10 <<<<
1 | 90 | 10 | | 2 | 50 | 11
1 | 100 | 15 | | 1 | 100 | 15
2 | 110 | 22 | | 2 | 110 | 22
| | | | | |
(50+60)/2 (8+10)/2
55 9
我不确定你打算做什么 "median price per unit":
CREATE TABLE t(
id INTEGER NOT NULL
,price INTEGER NOT NULL
,units INTEGER NOT NULL
);
INSERT INTO t(id,price,units) VALUES (1,30,7);
INSERT INTO t(id,price,units) VALUES (1,40,8);
INSERT INTO t(id,price,units) VALUES (1,50,8);
INSERT INTO t(id,price,units) VALUES (2,50,11);
INSERT INTO t(id,price,units) VALUES (2,60,8);
INSERT INTO t(id,price,units) VALUES (1,90,10);
INSERT INTO t(id,price,units) VALUES (1,100,15);
INSERT INTO t(id,price,units) VALUES (2,110,22);
SELECT
percentile_cont(0.5) WITHIN GROUP (ORDER BY price) med_price
, percentile_cont(0.5) WITHIN GROUP (ORDER BY units) med_units
FROM
t;
| med_price | med_units
----|-----------|-----------
1 | 55 | 9
如果 "price" 列表示 "unit price",那么您不需要将 55 除以 9,但如果 "price" 是 "order total",则您需要除以单位:55/9 = 6.11