从临时 table SQL 获取垃圾箱范围

Get bins range from temporary table SQL

我有一个与我之前的问题相关的问题。

我拥有的数据库如下所示:

category price  date
-------------------------    
Cat1       37   2019-03
Cat2       65   2019-03
Cat3       34   2019-03
Cat1       45   2019-03
Cat2       100  2019-03
Cat3       60   2019-03

这个数据库有数百个类别,并且来自另一个每个观察具有不同属性的数据库。

使用此代码:

WITH table AS 
(
    SELECT  
        category, price, date, 
        substring(date, 1, 4) AS year, 
        substring(date, 6, 2) as month
    FROM 
        original_table
    WHERE 
        (year = "2019" or year = "2020") 
        AND (month = "03") 
        AND product = "XXXXX"
    ORDER BY 
        anno
)
-- I get this from a bigger table, but prefer to make small steps 
-- that anyone in the fute can understand where this comes from as 
-- the original table is expected to grow fast
SELECT 
    category,
    ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
    SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
    tipo_establecimiento
FROM 
    (SELECT 
         *,
         LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
         LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
         CASE 
            WHEN (category_2>= 35) AND (category_2 <= 61)
                THEN 'S'
            ELSE 'N'
         END 'tipo_establecimiento'
     FROM 
         table)
WHERE 
    next_date IS NOT NULL AND Pct_change >= 0
ORDER BY 
    Pct_change DESC

这段代码让我看到了如下所示的数据:

category  Pct_change  period
cat1       0.21       2019-2020
cat2       0.53       2019-2020
cat3       0.76           "

太棒了!但我的下一个视图必须采用这个,并为我提供一个范围,显示每个范围内有多少类别。

它应该看起来像:

range       avg      num_cat_in
[0.1- 0.4]   0.3       3

最后一个 table 只是我期望的一个例子

我一直在尝试使用看起来像这样的代码,但我什么也没得到

   WITH table AS (
                SELECT  category,  price, date, substring(date, 1, 4) AS year, substring(date, 6, 2) as month
                FROM original_table
                WHERE (year= "2019" or year= "2020") and (month= "03") and product = "XXXXX"

                order by anno
)
-- I get this from a bigger table, but prefer to make small steps that anyone in the future can understand where this comes from as the original table is expected to grow fast
SELECT category,
       ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
       SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
       tipo_establecimiento
FROM (
  SELECT *,
         LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
         LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
         CASE 
        WHEN (category_2>= 35) AND (category_2 <= 61)
            THEN 'S'
        ELSE 'N'
    END 'tipo_establecimiento'
  FROM table
)
WHERE next_date IS NOT NULL AND Pct_change>=0
ORDER BY Pct_change DESC
WHERE next_date IS NOT NULL AND Pct_change>=0

)
SELECT 

count(CASE WHEN Pct_change> 0.12 AND Pct_change <= 0.22 THEN 1 END) AS [12 - 22],
count(CASE WHEN Pct_change> 0.22 AND Pct_change <= 0.32 THEN 1 END) AS [22 - 32],
count(CASE WHEN Pct_change> 0.32 AND Pct_change <= 0.42 THEN 1 END) AS [32 - 42],
count(CASE WHEN Pct_change> 0.42 AND Pct_change <= 0.52 THEN 1 END) AS [42 - 52],
count(CASE WHEN Pct_change> 0.52 AND Pct_change <= 0.62 THEN 1 END) AS [52 - 62],
count(CASE WHEN Pct_change> 0.62 AND Pct_change <= 0.72 THEN 1 END) AS [62 - 72],
count(CASE WHEN Pct_change> 0.72 AND Pct_change <= 0.82 THEN 1 END) AS [72 - 82]

谢谢!!!

比照。我的评论是,我首先假设您的范围不是硬编码的,并且您希望将数据平均分配到 Prc_change 的分位数。这意味着计算将计算出尽可能均匀地分割样本的范围。在这种情况下,以下将起作用(其中视图是您以前计算百分比的视图的名称):

select
  concat('[',min(Pct_change),'-',min(Pct_change),']') as `range`
  , avg(Pct_change) as `avg`
  , count(*) as num_cat_in
from(
  select *
    , ntile(5)over(order by Pct_change) as bin
  from theview
) t
group by bin
order by bin;

这里是a fiddle.


如果另一方面,您的范围是硬编码的,我假设范围在 table 中,例如我创建的范围:

create table theranges (lower DOUBLE, upper DOUBLE);
insert into theranges values (0,0.2),(0.2,0.4),(0.4,0.6),(0.6,0.8),(0.8,1);

(您必须确保范围不重叠。按照惯例,我包括从包含的下限到排除的上限的范围内的百分比,但包含的上限 1 除外。)然后是左连接 tables:

的问题
select
  concat('[',lower,'-',upper,']') as `range`
  , avg(Pct_change) as `avg`
  , sum(if(Pct_change is null, 0, 1)) as num_cat_in
from theranges left join theview on (Pct_change>=lower and if(upper=1,true,Pct_change<upper))
group by lower, upper
order by lower;

(请注意,在 upper=1 的位中,您必须将 1 更改为您的最高硬编码范围;这里我假设您的百分比介于 0 和 1 之间。)

这里是 second fiddle.