SQL/BIGQUERY 运行 日期中有 GAP 的平均值
SQL/BIGQUERY Running Average with GAPs in Dates
我在 BigQuery/SQL 中遇到移动平均线问题,我有 table 'SCORES' 并且我需要在使用用户对数据进行分组时制作 30 天移动平均线,问题是我的日期不是连续的,例如其中有间隔。
下面是我当前的代码:
SELECT user, date,
AVG(score) OVER (PARTITION BY user ORDER BY date)
FROM SCORES;
我不知道如何将日期限制添加到该行中,或者这是否可能。
我目前的 table 看起来像这样,但当然有更多的用户:
user date score
AA 13/02/2018 2.00
AA 15/02/2018 3.00
AA 17/02/2018 4.00
AA 01/03/2018 5.00
AA 28/03/2018 6.00
然后我需要它变成这样:
user date score 30D Avg
AA 13/02/2018 2.00 2.00
AA 15/02/2018 3.00 2.50
AA 17/02/2018 4.00 3.00
AA 01/03/2018 5.00 3.50
AA 28/03/2018 6.00 5.50
在最后一行中,由于日期原因,它只向后测量一个(向后最多 30 天)有什么方法可以在 SQL 中实现这个,还是我要求太多了?
您想使用 range between
。为此,您需要一个整数,因此:
select s.*,
avg(score) over (partition by user
order by days
range between 29 preceding and current row
) as avg_30day
from (select s.*, date_diff(s.date, date('2000-01-01'), day) as days
from scores s
) s;
date_diff()
的替代方法是 unix_date()
:
select s.*,
avg(score) over (partition by user
order by unix_days
range between 29 preceding and current row
) as avg_30day
from (select s.*, unix_date(s.date) as unix_days
from scores s
) s;
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT *,
AVG(score) OVER (
PARTITION BY user
ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
) AS avg_30day
FROM `project.dataset.scores`
您可以使用问题中的虚拟数据测试/玩上面的内容
#standardSQL
WITH `project.dataset.scores` AS (
SELECT 'AA' user, '13/02/2018' date, 2.00 score UNION ALL
SELECT 'AA', '15/02/2018', 3.00 UNION ALL
SELECT 'AA', '17/02/2018', 4.00 UNION ALL
SELECT 'AA', '01/03/2018', 5.00 UNION ALL
SELECT 'AA', '28/03/2018', 6.00
)
SELECT *,
AVG(score) OVER (
PARTITION BY user
ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
) AS avg_30day
FROM `project.dataset.scores`
结果
Row user date score avg_30day
1 AA 13/02/2018 2.0 2.0
2 AA 15/02/2018 3.0 2.5
3 AA 17/02/2018 4.0 3.0
4 AA 01/03/2018 5.0 3.5
5 AA 28/03/2018 6.0 5.5
我在 BigQuery/SQL 中遇到移动平均线问题,我有 table 'SCORES' 并且我需要在使用用户对数据进行分组时制作 30 天移动平均线,问题是我的日期不是连续的,例如其中有间隔。
下面是我当前的代码:
SELECT user, date,
AVG(score) OVER (PARTITION BY user ORDER BY date)
FROM SCORES;
我不知道如何将日期限制添加到该行中,或者这是否可能。
我目前的 table 看起来像这样,但当然有更多的用户:
user date score
AA 13/02/2018 2.00
AA 15/02/2018 3.00
AA 17/02/2018 4.00
AA 01/03/2018 5.00
AA 28/03/2018 6.00
然后我需要它变成这样:
user date score 30D Avg
AA 13/02/2018 2.00 2.00
AA 15/02/2018 3.00 2.50
AA 17/02/2018 4.00 3.00
AA 01/03/2018 5.00 3.50
AA 28/03/2018 6.00 5.50
在最后一行中,由于日期原因,它只向后测量一个(向后最多 30 天)有什么方法可以在 SQL 中实现这个,还是我要求太多了?
您想使用 range between
。为此,您需要一个整数,因此:
select s.*,
avg(score) over (partition by user
order by days
range between 29 preceding and current row
) as avg_30day
from (select s.*, date_diff(s.date, date('2000-01-01'), day) as days
from scores s
) s;
date_diff()
的替代方法是 unix_date()
:
select s.*,
avg(score) over (partition by user
order by unix_days
range between 29 preceding and current row
) as avg_30day
from (select s.*, unix_date(s.date) as unix_days
from scores s
) s;
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT *,
AVG(score) OVER (
PARTITION BY user
ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
) AS avg_30day
FROM `project.dataset.scores`
您可以使用问题中的虚拟数据测试/玩上面的内容
#standardSQL
WITH `project.dataset.scores` AS (
SELECT 'AA' user, '13/02/2018' date, 2.00 score UNION ALL
SELECT 'AA', '15/02/2018', 3.00 UNION ALL
SELECT 'AA', '17/02/2018', 4.00 UNION ALL
SELECT 'AA', '01/03/2018', 5.00 UNION ALL
SELECT 'AA', '28/03/2018', 6.00
)
SELECT *,
AVG(score) OVER (
PARTITION BY user
ORDER BY UNIX_DATE(PARSE_DATE('%d/%m/%Y', date))
RANGE BETWEEN 29 PRECEDING AND CURRENT ROW
) AS avg_30day
FROM `project.dataset.scores`
结果
Row user date score avg_30day
1 AA 13/02/2018 2.0 2.0
2 AA 15/02/2018 3.0 2.5
3 AA 17/02/2018 4.0 3.0
4 AA 01/03/2018 5.0 3.5
5 AA 28/03/2018 6.0 5.5