有没有一种简单的方法可以在 PostgreSQL 中计算 12 个月的移动平均值?
Is there an easy way to calculate 12 months moving average in PostgreSQL?
这个非常简单 SQL 可以计算定义明确的时间段(例如年、月、季度、周、日)的平均值、中位数等:
SELECT
date_trunc('year', t.time2), -- or hour, day, week, month, year
count(1),
percentile_cont(0.25) within group (order by t.price) as Q1,
percentile_cont(0.5) within group (order by t.price) as Q2,
percentile_cont(0.75) within group (order by t.price) as Q3,
avg(t.price) as A,
min(t.price) as Mi,
max(t.price) as Mx
FROM my_table AS t
GROUP BY 1
ORDER BY date_trunc
table 包含带有日期(时间戳)和价格(bigint)的单个交易列表。
但是,我正在努力调整它以计算 运行/ 移动值(例如 4 周、6 个月、2 个季度或 12 个月)。如何实现?
编辑
这是数据的样子:
这是预期结果:
编辑 2:
我遇到的另一个问题是移动平均值、中位数等计算中应该包含完整的数据集。
我的意思是,如果数据系列从 2000 年 1 月开始,那么第一个有意义的“12 个月移动平均值”只能在 2000 年 12 月计算(即包含完整 12 个月的第一个月)数据)。如果是 3 个月的移动平均值,第一个有意义的值将在 2000 年 3 月等。
所以,我在想,这个查询的逻辑应该是:
1) 确定用于计算平均值、中位数等统计数据的开始和结束日期,然后
2) 循环计算每个开始-结束日期对的平均值、中位数等。
为了说明,第一部分可以是:
WITH range_values AS ( -- get min and max values for the data series
SELECT date_trunc('month', min(time2)) as minval,
date_trunc('month', max(time2)) as maxval
FROM my_table),
period_range(d) AS ( -- generate complete list of periods eg. weeks, months, years for the data series
SELECT generate_series(minval, maxval, '1 month'::interval) as timeint
FROM range_values
),
lookup_range AS ( -- generate start-end date pairs based on the data series
select d as enddate, d- interval '11month' as startdate
from period_range
)
SELECT startdate, enddate
from lookup_range, range_values as p
where enddate >= p.minval + interval '11month'; -- clip date range to calculate 12 months avg using 12 months of data only
第二部分可能是(不是有效查询,只是为了说明逻辑):
SELECT
count(1),
percentile_cont(0.5) within group (order by t.price) as median_price,
avg(t.price) as avg_price
FROM my_table as t, lookup_range as l
WHERE t.time2>= 'startdate' AND t.time2 < 'enddate'
现在,挑战是如何将两者结合起来?以及如何使用最少的代码行使其工作?
我先按月汇总,然后计算移动平均值:
SELECT mon,
sum(s_price) OVER w / sum(c_price) OVER w
FROM (SELECT date_trunc('month', time2::timestamp) AS mon,
sum(price) AS s_price,
count(price) AS c_prize
FROM my_table
GROUP BY date_trunc('month', time2::timestamp)) AS q
WINDOW w AS (ORDER BY mon
RANGE BETWEEN '6 months'::interval PRECEDING
AND '6 months'::interval FOLLOWING);
如果有人正在寻找一次性计算 1,2,3,4,..6,...12 years/quarters/months/weeks/days/hours 移动平均数、中位数、百分位数等汇总统计数据的解决方案,这里是答案:
WITH grid AS (
SELECT end_time, start_time
FROM (
SELECT end_time
, lag(end_time, 12, 'infinity') OVER (ORDER BY end_time) AS start_time
FROM (
SELECT
generate_series(date_trunc('month', min(time2))
, date_trunc('month', max(time2)) + interval '1 month', interval '1 month') AS end_time
FROM my_table
) sub
) sub2
WHERE end_time > start_time
)
SELECT
to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d
, count(e.time2)
, percentile_cont(0.25) within group (order by e.price) as Q1
, percentile_cont(0.5) within group (order by e.price) as median
, percentile_cont(0.75) within group (order by e.price) as Q3
, avg(e.price) as Aver
, min(e.price) as Mi
, max(e.price) as Mx
FROM grid a
LEFT JOIN my_table e ON e.time2 >= a.start_time
AND e.time2 < a.end_time
GROUP BY end_time
ORDER BY d DESC
请注意,table 包含单个时间记录列表(如销售交易等),如实际问题中的示例所示。
还有这个位:
to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d
仅供展示。也就是说,PostgreSQL 中的约定是 "end of the month" 实际上是下个月的“0 小时”(即 2019 年 10 月末是“2019.11.01 00:00:00”)。这同样适用于任何时间范围(例如,2019 年底实际上是“2020.01.01 00:00:00”)。因此,如果不包括“- interval '1 month'”,则截至 2019 年 10 月的 12 个月移动统计数据将显示为 "for" 2019 年 11 月 1 日 00:00:00(截断为 2019-11)。
这个非常简单 SQL 可以计算定义明确的时间段(例如年、月、季度、周、日)的平均值、中位数等:
SELECT
date_trunc('year', t.time2), -- or hour, day, week, month, year
count(1),
percentile_cont(0.25) within group (order by t.price) as Q1,
percentile_cont(0.5) within group (order by t.price) as Q2,
percentile_cont(0.75) within group (order by t.price) as Q3,
avg(t.price) as A,
min(t.price) as Mi,
max(t.price) as Mx
FROM my_table AS t
GROUP BY 1
ORDER BY date_trunc
table 包含带有日期(时间戳)和价格(bigint)的单个交易列表。
但是,我正在努力调整它以计算 运行/ 移动值(例如 4 周、6 个月、2 个季度或 12 个月)。如何实现?
编辑 这是数据的样子:
这是预期结果:
编辑 2:
我遇到的另一个问题是移动平均值、中位数等计算中应该包含完整的数据集。
我的意思是,如果数据系列从 2000 年 1 月开始,那么第一个有意义的“12 个月移动平均值”只能在 2000 年 12 月计算(即包含完整 12 个月的第一个月)数据)。如果是 3 个月的移动平均值,第一个有意义的值将在 2000 年 3 月等。
所以,我在想,这个查询的逻辑应该是:
1) 确定用于计算平均值、中位数等统计数据的开始和结束日期,然后
2) 循环计算每个开始-结束日期对的平均值、中位数等。
为了说明,第一部分可以是:
WITH range_values AS ( -- get min and max values for the data series
SELECT date_trunc('month', min(time2)) as minval,
date_trunc('month', max(time2)) as maxval
FROM my_table),
period_range(d) AS ( -- generate complete list of periods eg. weeks, months, years for the data series
SELECT generate_series(minval, maxval, '1 month'::interval) as timeint
FROM range_values
),
lookup_range AS ( -- generate start-end date pairs based on the data series
select d as enddate, d- interval '11month' as startdate
from period_range
)
SELECT startdate, enddate
from lookup_range, range_values as p
where enddate >= p.minval + interval '11month'; -- clip date range to calculate 12 months avg using 12 months of data only
第二部分可能是(不是有效查询,只是为了说明逻辑):
SELECT
count(1),
percentile_cont(0.5) within group (order by t.price) as median_price,
avg(t.price) as avg_price
FROM my_table as t, lookup_range as l
WHERE t.time2>= 'startdate' AND t.time2 < 'enddate'
现在,挑战是如何将两者结合起来?以及如何使用最少的代码行使其工作?
我先按月汇总,然后计算移动平均值:
SELECT mon,
sum(s_price) OVER w / sum(c_price) OVER w
FROM (SELECT date_trunc('month', time2::timestamp) AS mon,
sum(price) AS s_price,
count(price) AS c_prize
FROM my_table
GROUP BY date_trunc('month', time2::timestamp)) AS q
WINDOW w AS (ORDER BY mon
RANGE BETWEEN '6 months'::interval PRECEDING
AND '6 months'::interval FOLLOWING);
如果有人正在寻找一次性计算 1,2,3,4,..6,...12 years/quarters/months/weeks/days/hours 移动平均数、中位数、百分位数等汇总统计数据的解决方案,这里是答案:
WITH grid AS (
SELECT end_time, start_time
FROM (
SELECT end_time
, lag(end_time, 12, 'infinity') OVER (ORDER BY end_time) AS start_time
FROM (
SELECT
generate_series(date_trunc('month', min(time2))
, date_trunc('month', max(time2)) + interval '1 month', interval '1 month') AS end_time
FROM my_table
) sub
) sub2
WHERE end_time > start_time
)
SELECT
to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d
, count(e.time2)
, percentile_cont(0.25) within group (order by e.price) as Q1
, percentile_cont(0.5) within group (order by e.price) as median
, percentile_cont(0.75) within group (order by e.price) as Q3
, avg(e.price) as Aver
, min(e.price) as Mi
, max(e.price) as Mx
FROM grid a
LEFT JOIN my_table e ON e.time2 >= a.start_time
AND e.time2 < a.end_time
GROUP BY end_time
ORDER BY d DESC
请注意,table 包含单个时间记录列表(如销售交易等),如实际问题中的示例所示。
还有这个位:
to_char(date_trunc('month',a.end_time - interval '1 month'), 'YYYY-MM') as d
仅供展示。也就是说,PostgreSQL 中的约定是 "end of the month" 实际上是下个月的“0 小时”(即 2019 年 10 月末是“2019.11.01 00:00:00”)。这同样适用于任何时间范围(例如,2019 年底实际上是“2020.01.01 00:00:00”)。因此,如果不包括“- interval '1 month'”,则截至 2019 年 10 月的 12 个月移动统计数据将显示为 "for" 2019 年 11 月 1 日 00:00:00(截断为 2019-11)。