在 postgreSQL 中移动 window 平均值
Moving window average in postgreSQL
我在 csv 文件中有一个数据集,其中包含日期、类别和值。
但是,日期可能会有差距。例如
Date | Category | Value
2016-01-01 Category A 6
2016-01-02 Category A 7
2016-01-03 Category A 4
2016-01-01 Category B 4
2016-01-01 Category C 16
2016-01-02 Category C 8
2016-01-02 Category D 5
我在 PostgreSQL 的 table 中导入了数据。
我需要为每个类别计算过去 7 天的滚动平均值(在此示例中,我们将其简化为过去 3 天)。但是,我需要将每个类别的缺失日期填写为 0。我最初尝试的是:
在计算平均值之前添加带 0 的空白字段
Select Seven_day.date,
coalesce(data.value,Seven_day.blank_count),
category,
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date,
0 as blank_count
from data t) as Seven_day
left outer join data on data.date=Seven_day.date
但是,这不会生成正确的空白字段。而且它非常慢,因为我的数据集很大。
有没有更好的方法来解决这个问题?是否可以在创建 table 本身时处理这个问题?例如自动生成默认值为 0 的日期系列?但是,我如何处理日期,类别对是这里的主要问题。
我找到了一个迷你解决方案:
Select Seven_day.date,
Seven_day.category as cat,
coalesce(test.value,Seven_day.blank_count)
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date, t.category,
0 as blank_count
from test t
order by t.category,date) as Seven_day
left outer join test on test.date=Seven_day.date and test.category=Seven_day.category
order by cat,date
3 天的平均值可扩展到任意天数:SQL Fiddle
select *
from (
select
date, value, category,
avg(value) over (
partition by category
order by date
rows between 2 preceding and current row
) as average
from (
select date::date as date, coalesce(value, 0) as value, category
from
t
right join
( -- computed table with all the possible dates x categories
(
select distinct category
from t
) c
cross join
generate_series (
(select min(date) - 2 from t),
(select max(date) from t),
'1 day'
) gs(date)
) s using(category, date)
) s
) s
where date >= (select min(date) from t)
order by date, category
;
date | value | category | average
------------+-------+----------+------------------------
2016-01-01 | 6 | A | 2.0000000000000000
2016-01-01 | 4 | B | 1.3333333333333333
2016-01-01 | 16 | C | 5.3333333333333333
2016-01-01 | 0 | D | 0.00000000000000000000
2016-01-02 | 7 | A | 4.3333333333333333
2016-01-02 | 0 | B | 1.3333333333333333
2016-01-02 | 8 | C | 8.0000000000000000
2016-01-02 | 5 | D | 1.6666666666666667
2016-01-03 | 4 | A | 5.6666666666666667
2016-01-03 | 0 | B | 1.3333333333333333
2016-01-03 | 0 | C | 8.0000000000000000
2016-01-03 | 0 | D | 1.6666666666666667
我在 csv 文件中有一个数据集,其中包含日期、类别和值。 但是,日期可能会有差距。例如
Date | Category | Value
2016-01-01 Category A 6
2016-01-02 Category A 7
2016-01-03 Category A 4
2016-01-01 Category B 4
2016-01-01 Category C 16
2016-01-02 Category C 8
2016-01-02 Category D 5
我在 PostgreSQL 的 table 中导入了数据。
我需要为每个类别计算过去 7 天的滚动平均值(在此示例中,我们将其简化为过去 3 天)。但是,我需要将每个类别的缺失日期填写为 0。我最初尝试的是:
在计算平均值之前添加带 0 的空白字段
Select Seven_day.date,
coalesce(data.value,Seven_day.blank_count),
category,
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date,
0 as blank_count
from data t) as Seven_day
left outer join data on data.date=Seven_day.date
但是,这不会生成正确的空白字段。而且它非常慢,因为我的数据集很大。
有没有更好的方法来解决这个问题?是否可以在创建 table 本身时处理这个问题?例如自动生成默认值为 0 的日期系列?但是,我如何处理日期,类别对是这里的主要问题。
我找到了一个迷你解决方案:
Select Seven_day.date,
Seven_day.category as cat,
coalesce(test.value,Seven_day.blank_count)
from ( select distinct GENERATE_SERIES(t.date-'6 day'::interval,t.date,'1 day'::interval)::date as date, t.category,
0 as blank_count
from test t
order by t.category,date) as Seven_day
left outer join test on test.date=Seven_day.date and test.category=Seven_day.category
order by cat,date
3 天的平均值可扩展到任意天数:SQL Fiddle
select *
from (
select
date, value, category,
avg(value) over (
partition by category
order by date
rows between 2 preceding and current row
) as average
from (
select date::date as date, coalesce(value, 0) as value, category
from
t
right join
( -- computed table with all the possible dates x categories
(
select distinct category
from t
) c
cross join
generate_series (
(select min(date) - 2 from t),
(select max(date) from t),
'1 day'
) gs(date)
) s using(category, date)
) s
) s
where date >= (select min(date) from t)
order by date, category
;
date | value | category | average
------------+-------+----------+------------------------
2016-01-01 | 6 | A | 2.0000000000000000
2016-01-01 | 4 | B | 1.3333333333333333
2016-01-01 | 16 | C | 5.3333333333333333
2016-01-01 | 0 | D | 0.00000000000000000000
2016-01-02 | 7 | A | 4.3333333333333333
2016-01-02 | 0 | B | 1.3333333333333333
2016-01-02 | 8 | C | 8.0000000000000000
2016-01-02 | 5 | D | 1.6666666666666667
2016-01-03 | 4 | A | 5.6666666666666667
2016-01-03 | 0 | B | 1.3333333333333333
2016-01-03 | 0 | C | 8.0000000000000000
2016-01-03 | 0 | D | 1.6666666666666667