如何使用列值范围应用 NTILE(4)?

How to apply NTILE(4) using range of column values?

想使用 NTILE 查看国家/地区的林地占总土地面积百分比分布情况。我想使用的列中的值范围是从 0.00053 到非常接近 98.25,并且国家在该范围隐含的四分位数中分布不均,即 0 到 25、25 到 50、50 到 75,以及大约 75 到 100。相反,NTILE 只是将 table 分成四组,行数相同。如何使用 NTILE 根据值分配分位数?

SELECT country, forest, pcnt_forest,
       NTILE(4) OVER(ORDER BY pcnt_forest) AS quartile
FROM percent_forest

WIDTH_BUCKET 函数非常适合这种情况:

WIDTH_BUCKET(Oracle) lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. (Compare this function with NTILE, which creates equiheight histograms.)

Oracle、Snowflake、PostgreSQL 等都支持它...

您的代码:

SELECT country,  pcnt_forest
       ,WIDTH_BUCKET(pcnt_forest, 0, 1, 4) AS w
       ,NTILE(4) OVER(ORDER BY pcnt_forest) AS ntile  -- for comparison
FROM percent_forest
ORDER BY w

db<>fiddle demo

输出:

+----------+--------------+----+-------+
| COUNTRY  | PCNT_FOREST  | W  | NTILE |
+----------+--------------+----+-------+
| A        |         .05  | 1  |     1 |
| B        |         .06  | 1  |     1 |
| C        |         .07  | 1  |     2 |
| E        |         .49  | 2  |     2 |
| D        |         .51  | 3  |     3 |
| F        |         .96  | 4  |     3 |
| G        |         .97  | 4  |     4 |
| H        |         .98  | 4  |     4 |
+----------+--------------+----+-------+

您可以使用 case 表达式:

select pf.*,
       (case when pcnt_forest < 0.25 then 1
             when pcnt_forest < 0.50 then 2
             when pcnt_forest < 0.75 then 3
             else 4
        end) as bin
from percent_forest pf;

或者,更简单,使用算术:

select pf.*,
       floor(pcnt_forest * 4) + 1 bin
from percent_forest pf;

我不会在本专栏中使用术语 "quartile"。四分位数意味着四个大小相等的箱子(或至少在给定重复值的情况下尽可能接近)。