从临时 table SQL 获取垃圾箱范围
Get bins range from temporary table SQL
我有一个与我之前的问题相关的问题。
我拥有的数据库如下所示:
category price date
-------------------------
Cat1 37 2019-03
Cat2 65 2019-03
Cat3 34 2019-03
Cat1 45 2019-03
Cat2 100 2019-03
Cat3 60 2019-03
这个数据库有数百个类别,并且来自另一个每个观察具有不同属性的数据库。
使用此代码:
WITH table AS
(
SELECT
category, price, date,
substring(date, 1, 4) AS year,
substring(date, 6, 2) as month
FROM
original_table
WHERE
(year = "2019" or year = "2020")
AND (month = "03")
AND product = "XXXXX"
ORDER BY
anno
)
-- I get this from a bigger table, but prefer to make small steps
-- that anyone in the fute can understand where this comes from as
-- the original table is expected to grow fast
SELECT
category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM
(SELECT
*,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM
table)
WHERE
next_date IS NOT NULL AND Pct_change >= 0
ORDER BY
Pct_change DESC
这段代码让我看到了如下所示的数据:
category Pct_change period
cat1 0.21 2019-2020
cat2 0.53 2019-2020
cat3 0.76 "
太棒了!但我的下一个视图必须采用这个,并为我提供一个范围,显示每个范围内有多少类别。
它应该看起来像:
range avg num_cat_in
[0.1- 0.4] 0.3 3
最后一个 table 只是我期望的一个例子
我一直在尝试使用看起来像这样的代码,但我什么也没得到
WITH table AS (
SELECT category, price, date, substring(date, 1, 4) AS year, substring(date, 6, 2) as month
FROM original_table
WHERE (year= "2019" or year= "2020") and (month= "03") and product = "XXXXX"
order by anno
)
-- I get this from a bigger table, but prefer to make small steps that anyone in the future can understand where this comes from as the original table is expected to grow fast
SELECT category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM (
SELECT *,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM table
)
WHERE next_date IS NOT NULL AND Pct_change>=0
ORDER BY Pct_change DESC
WHERE next_date IS NOT NULL AND Pct_change>=0
)
SELECT
count(CASE WHEN Pct_change> 0.12 AND Pct_change <= 0.22 THEN 1 END) AS [12 - 22],
count(CASE WHEN Pct_change> 0.22 AND Pct_change <= 0.32 THEN 1 END) AS [22 - 32],
count(CASE WHEN Pct_change> 0.32 AND Pct_change <= 0.42 THEN 1 END) AS [32 - 42],
count(CASE WHEN Pct_change> 0.42 AND Pct_change <= 0.52 THEN 1 END) AS [42 - 52],
count(CASE WHEN Pct_change> 0.52 AND Pct_change <= 0.62 THEN 1 END) AS [52 - 62],
count(CASE WHEN Pct_change> 0.62 AND Pct_change <= 0.72 THEN 1 END) AS [62 - 72],
count(CASE WHEN Pct_change> 0.72 AND Pct_change <= 0.82 THEN 1 END) AS [72 - 82]
谢谢!!!
比照。我的评论是,我首先假设您的范围不是硬编码的,并且您希望将数据平均分配到 Prc_change 的分位数。这意味着计算将计算出尽可能均匀地分割样本的范围。在这种情况下,以下将起作用(其中视图是您以前计算百分比的视图的名称):
select
concat('[',min(Pct_change),'-',min(Pct_change),']') as `range`
, avg(Pct_change) as `avg`
, count(*) as num_cat_in
from(
select *
, ntile(5)over(order by Pct_change) as bin
from theview
) t
group by bin
order by bin;
这里是a fiddle.
如果另一方面,您的范围是硬编码的,我假设范围在 table 中,例如我创建的范围:
create table theranges (lower DOUBLE, upper DOUBLE);
insert into theranges values (0,0.2),(0.2,0.4),(0.4,0.6),(0.6,0.8),(0.8,1);
(您必须确保范围不重叠。按照惯例,我包括从包含的下限到排除的上限的范围内的百分比,但包含的上限 1 除外。)然后是左连接 tables:
的问题
select
concat('[',lower,'-',upper,']') as `range`
, avg(Pct_change) as `avg`
, sum(if(Pct_change is null, 0, 1)) as num_cat_in
from theranges left join theview on (Pct_change>=lower and if(upper=1,true,Pct_change<upper))
group by lower, upper
order by lower;
(请注意,在 upper=1
的位中,您必须将 1 更改为您的最高硬编码范围;这里我假设您的百分比介于 0 和 1 之间。)
这里是 second fiddle.
我有一个与我之前的问题相关的问题。
我拥有的数据库如下所示:
category price date
-------------------------
Cat1 37 2019-03
Cat2 65 2019-03
Cat3 34 2019-03
Cat1 45 2019-03
Cat2 100 2019-03
Cat3 60 2019-03
这个数据库有数百个类别,并且来自另一个每个观察具有不同属性的数据库。
使用此代码:
WITH table AS
(
SELECT
category, price, date,
substring(date, 1, 4) AS year,
substring(date, 6, 2) as month
FROM
original_table
WHERE
(year = "2019" or year = "2020")
AND (month = "03")
AND product = "XXXXX"
ORDER BY
anno
)
-- I get this from a bigger table, but prefer to make small steps
-- that anyone in the fute can understand where this comes from as
-- the original table is expected to grow fast
SELECT
category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM
(SELECT
*,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM
table)
WHERE
next_date IS NOT NULL AND Pct_change >= 0
ORDER BY
Pct_change DESC
这段代码让我看到了如下所示的数据:
category Pct_change period
cat1 0.21 2019-2020
cat2 0.53 2019-2020
cat3 0.76 "
太棒了!但我的下一个视图必须采用这个,并为我提供一个范围,显示每个范围内有多少类别。
它应该看起来像:
range avg num_cat_in
[0.1- 0.4] 0.3 3
最后一个 table 只是我期望的一个例子
我一直在尝试使用看起来像这样的代码,但我什么也没得到
WITH table AS (
SELECT category, price, date, substring(date, 1, 4) AS year, substring(date, 6, 2) as month
FROM original_table
WHERE (year= "2019" or year= "2020") and (month= "03") and product = "XXXXX"
order by anno
)
-- I get this from a bigger table, but prefer to make small steps that anyone in the future can understand where this comes from as the original table is expected to grow fast
SELECT category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM (
SELECT *,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM table
)
WHERE next_date IS NOT NULL AND Pct_change>=0
ORDER BY Pct_change DESC
WHERE next_date IS NOT NULL AND Pct_change>=0
)
SELECT
count(CASE WHEN Pct_change> 0.12 AND Pct_change <= 0.22 THEN 1 END) AS [12 - 22],
count(CASE WHEN Pct_change> 0.22 AND Pct_change <= 0.32 THEN 1 END) AS [22 - 32],
count(CASE WHEN Pct_change> 0.32 AND Pct_change <= 0.42 THEN 1 END) AS [32 - 42],
count(CASE WHEN Pct_change> 0.42 AND Pct_change <= 0.52 THEN 1 END) AS [42 - 52],
count(CASE WHEN Pct_change> 0.52 AND Pct_change <= 0.62 THEN 1 END) AS [52 - 62],
count(CASE WHEN Pct_change> 0.62 AND Pct_change <= 0.72 THEN 1 END) AS [62 - 72],
count(CASE WHEN Pct_change> 0.72 AND Pct_change <= 0.82 THEN 1 END) AS [72 - 82]
谢谢!!!
比照。我的评论是,我首先假设您的范围不是硬编码的,并且您希望将数据平均分配到 Prc_change 的分位数。这意味着计算将计算出尽可能均匀地分割样本的范围。在这种情况下,以下将起作用(其中视图是您以前计算百分比的视图的名称):
select
concat('[',min(Pct_change),'-',min(Pct_change),']') as `range`
, avg(Pct_change) as `avg`
, count(*) as num_cat_in
from(
select *
, ntile(5)over(order by Pct_change) as bin
from theview
) t
group by bin
order by bin;
这里是a fiddle.
如果另一方面,您的范围是硬编码的,我假设范围在 table 中,例如我创建的范围:
create table theranges (lower DOUBLE, upper DOUBLE);
insert into theranges values (0,0.2),(0.2,0.4),(0.4,0.6),(0.6,0.8),(0.8,1);
(您必须确保范围不重叠。按照惯例,我包括从包含的下限到排除的上限的范围内的百分比,但包含的上限 1 除外。)然后是左连接 tables:
的问题select
concat('[',lower,'-',upper,']') as `range`
, avg(Pct_change) as `avg`
, sum(if(Pct_change is null, 0, 1)) as num_cat_in
from theranges left join theview on (Pct_change>=lower and if(upper=1,true,Pct_change<upper))
group by lower, upper
order by lower;
(请注意,在 upper=1
的位中,您必须将 1 更改为您的最高硬编码范围;这里我假设您的百分比介于 0 和 1 之间。)
这里是 second fiddle.