Percentile_disc() 对于非整数值
Percentile_disc() for a non round values
我正在努力寻找解决方案,但运气不佳。
在我的查询中,我 select 使用 count(*)
和 percentile_disc(.9)
来找到它的第 90 个位置。
情况是,当计数为 29 时,第 90 个百分位数比 27 更接近数字 26,但仍返回第 27 个对象。
有什么办法可以说,如果 5 < Nth <10 将结果减一?
Table供参考
ID Count 90th
-------------------
1 50 45
2 40 36
3 27 25 <-- Should be 24
4 9 9 <-- Should be 8
9 的 90% 是 0.9,应该删除 1 并得到 8。
---这是我对第N个百分位的理解---
现在我有:
我的 table 丢失了条目(每天 + 100k),所以我想 运行 每天进行此查询。
Service_id start_time end_time
-------------------------------------
Service1 1499025651614 1499025651648
Service2 1499025655145 1499025655434
Service3 1499025656029 1499025656112
Service2 1499025658755 1499025659135
Service3 1499025726862 1499025728346
Service1 1499025748782 1499025750032
Service3 1499025749277 1499025749900
Service3 1499025757681 1499025758517
Service2 1499025775000 1499025775101
Service1 1499025785556 1499025785633
...
我想查询 select 每个服务的最小值、最大值和平均值
select mt.SERVICE_ID as SERVICE_ID,
count(*) as COUNT,
round(avg((mt.end_time - mt.start_time) / 1000), 2) as Avg,
round(min((mt.end_time - mt.start_time) / 1000), 2) AS Min,
round(max((mt.end_time - mt.start_time) / 1000), 2) AS Max
from myTable mt
group by mt.service_id
我想合并使用连接之前讨论的第 90 个百分位数。
select service_id, round(percentile_disc(.90) within group(order by elapsed), 2) as perc
from (select mt.service_id, ((mt.end_time - mt.start_time) / 1000) as elapsed
from myTable mt)
group by service_id
当计数为(比方说)9 时出现问题,在这种情况下,MAX
和 Perc
是相同的(因为百分位数没有删除任何东西)但我需要在这种特殊情况下,删除最后一个,结果给我第 8 位的时间。
这种情况下有什么办法可以再去掉一个位置吗?
PERCENTILE_DISC()
并不像您想象的那样。
Purpose
PERCENTILE_DISC is an inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification and returns an element from the set. Nulls are ignored in the calculation.
...
For a given percentile value P
, PERCENTILE_DISC
sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST
value (with respect to the same sort specification) that is greater than or equal to P
.
Analytic Example
The following example calculates the median discrete percentile of the salary of each employee in the sample table hr.employees:
SELECT last_name, salary, department_id,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC)
OVER (PARTITION BY department_id) "Percentile_Disc",
CUME_DIST() OVER (PARTITION BY department_id
ORDER BY salary DESC) "Cume_Dist"
FROM employees where department_id in (30, 60);
LAST_NAME SALARY DEPARTMENT_ID Percentile_Disc Cume_Dist
------------- ---------- ------------- --------------- ----------
Raphaely 11000 30 2900 .166666667
Khoo 3100 30 2900 .333333333
Baida 2900 30 2900 .5
Tobias 2800 30 2900 .666666667
Himuro 2600 30 2900 .833333333
Colmenares 2500 30 2900 1
Hunold 9000 60 4800 .2
Ernst 6000 60 4800 .4
Austin 4800 60 4800 .8
Pataballa 4800 60 4800 .8
Lorentz 4200 60 4800 1
The median value for Department 30 is 2900, which is the value whose
corresponding percentile (Cume_Dist) is the smallest value greater
than or equal to 0.5. The median value for Department 60 is 4800,
which is the value whose corresponding percentile is the smallest
value greater than or equal to 0.5.
在文档中给出的示例中,如果百分位数设置为 0.9
(而不是 0.5
),那么您可以看到 CUME_DIST
来自 0.8
到 1
(对于部门 60)所以 PERCENTILE_DISC(0.9) ...
会给出 4200
,因为这是最小 CUME_DIST
大于或等于 0.9
的值。在这种情况下,要获得倒数第二个值,您需要 0.8
的百分位数。
The issue comes when the count is (lets say) 9, in this case, the MAX and the Perc is the same (due the percentile is not removing anything) but I need in this particular case, to remove the last one, giving me as result the timing in the position 8th.
对于 9 个项目,每行的 CUME_DIST
值为:
ROW_NUMBER CUME_DIST
---------- ---------
1 .111
2 .222
3 .333
4 .444
5 .556
6 .667
7 .778
8 .889
9 1.000
如果您使用 PERCENTILE_DISC( 0.9 )
那么它会查找具有大于或等于该值的最低 CUME_DIST
的值 - 只有一个值 1.000
是也是最大值。
如果您想要不同的值,则需要使用较低的百分位数。
更新:
你可以试试这样:
select service_id,
elapsed as perc
from (
select service_id,
(end_time - start_time) / 1000 as elapsed,
ROW_NUMBER() OVER ( PARTITION BY service_id ORDER BY (end_time - start_time) )
AS rn,
COUNT() OVER ( PARTITION BY service_id ) AS ct
from myTable
)
WHERE rn = ROUND( 0.9 * ct );
更改最后一行以根据您的业务逻辑使用 ROUND
、FLOOR
或 CEIL
。如果我正确地确定了逻辑,CEIL
将给出与使用 PERCENTILE_DISC
.
相同的答案
What I need is the count is 7, remove the last record and return the 6th value (90% of 7 is 0.7 , round to 1), is the count is 21, remove the last 2 records and return the 19th position-value (90% of 21 is 2.1 round to 2) and so on.
使用rn = ROUND( 0.9 * ct )
:
- 如果计数是 7 那么
0.9 * 7 = 6.3
所以 ROUND( 6.3 )
将给出第 6 行
- 如果计数是 21 那么
0.9 * 21 = 18.9
所以 ROUND( 18.9 )
将给出第 19 行
- 如果计数为 3,则
0.9 * 3 = 2.7
所以 ROUND( 2.7 )
将给出第 3 行(最大值)。
目前还不清楚您希望为小集合返回什么 - 如果您从不想要最大行(除非只有一行),那么类似于:
WHERE rn = GREATEST( 1, LEAST( ct - 1, ROUND( 0.9 * ct ) ) )
我正在努力寻找解决方案,但运气不佳。
在我的查询中,我 select 使用 count(*)
和 percentile_disc(.9)
来找到它的第 90 个位置。
情况是,当计数为 29 时,第 90 个百分位数比 27 更接近数字 26,但仍返回第 27 个对象。
有什么办法可以说,如果 5 < Nth <10 将结果减一?
Table供参考
ID Count 90th
-------------------
1 50 45
2 40 36
3 27 25 <-- Should be 24
4 9 9 <-- Should be 8
9 的 90% 是 0.9,应该删除 1 并得到 8。
---这是我对第N个百分位的理解---
现在我有:
我的 table 丢失了条目(每天 + 100k),所以我想 运行 每天进行此查询。
Service_id start_time end_time
-------------------------------------
Service1 1499025651614 1499025651648
Service2 1499025655145 1499025655434
Service3 1499025656029 1499025656112
Service2 1499025658755 1499025659135
Service3 1499025726862 1499025728346
Service1 1499025748782 1499025750032
Service3 1499025749277 1499025749900
Service3 1499025757681 1499025758517
Service2 1499025775000 1499025775101
Service1 1499025785556 1499025785633
...
我想查询 select 每个服务的最小值、最大值和平均值
select mt.SERVICE_ID as SERVICE_ID,
count(*) as COUNT,
round(avg((mt.end_time - mt.start_time) / 1000), 2) as Avg,
round(min((mt.end_time - mt.start_time) / 1000), 2) AS Min,
round(max((mt.end_time - mt.start_time) / 1000), 2) AS Max
from myTable mt
group by mt.service_id
我想合并使用连接之前讨论的第 90 个百分位数。
select service_id, round(percentile_disc(.90) within group(order by elapsed), 2) as perc
from (select mt.service_id, ((mt.end_time - mt.start_time) / 1000) as elapsed
from myTable mt)
group by service_id
当计数为(比方说)9 时出现问题,在这种情况下,MAX
和 Perc
是相同的(因为百分位数没有删除任何东西)但我需要在这种特殊情况下,删除最后一个,结果给我第 8 位的时间。
这种情况下有什么办法可以再去掉一个位置吗?
PERCENTILE_DISC()
并不像您想象的那样。
Purpose
PERCENTILE_DISC is an inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification and returns an element from the set. Nulls are ignored in the calculation.
...
For a given percentile value
P
,PERCENTILE_DISC
sorts the values of the expression in the ORDER BY clause and returns the value with the smallestCUME_DIST
value (with respect to the same sort specification) that is greater than or equal toP
.Analytic Example
The following example calculates the median discrete percentile of the salary of each employee in the sample table hr.employees:
SELECT last_name, salary, department_id, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) OVER (PARTITION BY department_id) "Percentile_Disc", CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary DESC) "Cume_Dist" FROM employees where department_id in (30, 60); LAST_NAME SALARY DEPARTMENT_ID Percentile_Disc Cume_Dist ------------- ---------- ------------- --------------- ---------- Raphaely 11000 30 2900 .166666667 Khoo 3100 30 2900 .333333333 Baida 2900 30 2900 .5 Tobias 2800 30 2900 .666666667 Himuro 2600 30 2900 .833333333 Colmenares 2500 30 2900 1 Hunold 9000 60 4800 .2 Ernst 6000 60 4800 .4 Austin 4800 60 4800 .8 Pataballa 4800 60 4800 .8 Lorentz 4200 60 4800 1
The median value for Department 30 is 2900, which is the value whose corresponding percentile (Cume_Dist) is the smallest value greater than or equal to 0.5. The median value for Department 60 is 4800, which is the value whose corresponding percentile is the smallest value greater than or equal to 0.5.
在文档中给出的示例中,如果百分位数设置为 0.9
(而不是 0.5
),那么您可以看到 CUME_DIST
来自 0.8
到 1
(对于部门 60)所以 PERCENTILE_DISC(0.9) ...
会给出 4200
,因为这是最小 CUME_DIST
大于或等于 0.9
的值。在这种情况下,要获得倒数第二个值,您需要 0.8
的百分位数。
The issue comes when the count is (lets say) 9, in this case, the MAX and the Perc is the same (due the percentile is not removing anything) but I need in this particular case, to remove the last one, giving me as result the timing in the position 8th.
对于 9 个项目,每行的 CUME_DIST
值为:
ROW_NUMBER CUME_DIST
---------- ---------
1 .111
2 .222
3 .333
4 .444
5 .556
6 .667
7 .778
8 .889
9 1.000
如果您使用 PERCENTILE_DISC( 0.9 )
那么它会查找具有大于或等于该值的最低 CUME_DIST
的值 - 只有一个值 1.000
是也是最大值。
如果您想要不同的值,则需要使用较低的百分位数。
更新:
你可以试试这样:
select service_id,
elapsed as perc
from (
select service_id,
(end_time - start_time) / 1000 as elapsed,
ROW_NUMBER() OVER ( PARTITION BY service_id ORDER BY (end_time - start_time) )
AS rn,
COUNT() OVER ( PARTITION BY service_id ) AS ct
from myTable
)
WHERE rn = ROUND( 0.9 * ct );
更改最后一行以根据您的业务逻辑使用 ROUND
、FLOOR
或 CEIL
。如果我正确地确定了逻辑,CEIL
将给出与使用 PERCENTILE_DISC
.
What I need is the count is 7, remove the last record and return the 6th value (90% of 7 is 0.7 , round to 1), is the count is 21, remove the last 2 records and return the 19th position-value (90% of 21 is 2.1 round to 2) and so on.
使用rn = ROUND( 0.9 * ct )
:
- 如果计数是 7 那么
0.9 * 7 = 6.3
所以ROUND( 6.3 )
将给出第 6 行 - 如果计数是 21 那么
0.9 * 21 = 18.9
所以ROUND( 18.9 )
将给出第 19 行 - 如果计数为 3,则
0.9 * 3 = 2.7
所以ROUND( 2.7 )
将给出第 3 行(最大值)。
目前还不清楚您希望为小集合返回什么 - 如果您从不想要最大行(除非只有一行),那么类似于:
WHERE rn = GREATEST( 1, LEAST( ct - 1, ROUND( 0.9 * ct ) ) )