如何使用列值范围应用 NTILE(4)?
How to apply NTILE(4) using range of column values?
想使用 NTILE
查看国家/地区的林地占总土地面积百分比分布情况。我想使用的列中的值范围是从 0.00053 到非常接近 98.25,并且国家在该范围隐含的四分位数中分布不均,即 0 到 25、25 到 50、50 到 75,以及大约 75 到 100。相反,NTILE
只是将 table 分成四组,行数相同。如何使用 NTILE
根据值分配分位数?
SELECT country, forest, pcnt_forest,
NTILE(4) OVER(ORDER BY pcnt_forest) AS quartile
FROM percent_forest
WIDTH_BUCKET 函数非常适合这种情况:
WIDTH_BUCKET(Oracle) lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. (Compare this function with NTILE, which creates equiheight histograms.)
Oracle、Snowflake、PostgreSQL 等都支持它...
您的代码:
SELECT country, pcnt_forest
,WIDTH_BUCKET(pcnt_forest, 0, 1, 4) AS w
,NTILE(4) OVER(ORDER BY pcnt_forest) AS ntile -- for comparison
FROM percent_forest
ORDER BY w
输出:
+----------+--------------+----+-------+
| COUNTRY | PCNT_FOREST | W | NTILE |
+----------+--------------+----+-------+
| A | .05 | 1 | 1 |
| B | .06 | 1 | 1 |
| C | .07 | 1 | 2 |
| E | .49 | 2 | 2 |
| D | .51 | 3 | 3 |
| F | .96 | 4 | 3 |
| G | .97 | 4 | 4 |
| H | .98 | 4 | 4 |
+----------+--------------+----+-------+
您可以使用 case
表达式:
select pf.*,
(case when pcnt_forest < 0.25 then 1
when pcnt_forest < 0.50 then 2
when pcnt_forest < 0.75 then 3
else 4
end) as bin
from percent_forest pf;
或者,更简单,使用算术:
select pf.*,
floor(pcnt_forest * 4) + 1 bin
from percent_forest pf;
我不会在本专栏中使用术语 "quartile"。四分位数意味着四个大小相等的箱子(或至少在给定重复值的情况下尽可能接近)。
想使用 NTILE
查看国家/地区的林地占总土地面积百分比分布情况。我想使用的列中的值范围是从 0.00053 到非常接近 98.25,并且国家在该范围隐含的四分位数中分布不均,即 0 到 25、25 到 50、50 到 75,以及大约 75 到 100。相反,NTILE
只是将 table 分成四组,行数相同。如何使用 NTILE
根据值分配分位数?
SELECT country, forest, pcnt_forest,
NTILE(4) OVER(ORDER BY pcnt_forest) AS quartile
FROM percent_forest
WIDTH_BUCKET 函数非常适合这种情况:
WIDTH_BUCKET(Oracle) lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. (Compare this function with NTILE, which creates equiheight histograms.)
Oracle、Snowflake、PostgreSQL 等都支持它...
您的代码:
SELECT country, pcnt_forest
,WIDTH_BUCKET(pcnt_forest, 0, 1, 4) AS w
,NTILE(4) OVER(ORDER BY pcnt_forest) AS ntile -- for comparison
FROM percent_forest
ORDER BY w
输出:
+----------+--------------+----+-------+
| COUNTRY | PCNT_FOREST | W | NTILE |
+----------+--------------+----+-------+
| A | .05 | 1 | 1 |
| B | .06 | 1 | 1 |
| C | .07 | 1 | 2 |
| E | .49 | 2 | 2 |
| D | .51 | 3 | 3 |
| F | .96 | 4 | 3 |
| G | .97 | 4 | 4 |
| H | .98 | 4 | 4 |
+----------+--------------+----+-------+
您可以使用 case
表达式:
select pf.*,
(case when pcnt_forest < 0.25 then 1
when pcnt_forest < 0.50 then 2
when pcnt_forest < 0.75 then 3
else 4
end) as bin
from percent_forest pf;
或者,更简单,使用算术:
select pf.*,
floor(pcnt_forest * 4) + 1 bin
from percent_forest pf;
我不会在本专栏中使用术语 "quartile"。四分位数意味着四个大小相等的箱子(或至少在给定重复值的情况下尽可能接近)。