Bigquery macros/repeated 查询部分
Bigquery macros/repeated query parts
我们使用 Bigquery 来计算许多日常指标,但我们也始终对长期平均值(7 天、14 天、28 天、QTD、YTD)感兴趣。
这总是这样完成的 (ds: date):
AVG(metric_1d) OVER (
ORDER BY ds
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS metric_7d,
AVG(metric_1d) OVER (
ORDER BY ds
ROWS BETWEEN 13 PRECEDING AND CURRENT ROW
) AS metric_14d,
AVG(metric_1d) OVER (
ORDER BY ds
ROWS BETWEEN 27 PRECEDING AND CURRENT ROW
) AS metric_28d,
AVG(metric_1d) OVER (
PARTITION BY CONCAT(EXTRACT(YEAR FROM ds), DIV(EXTRACT(MONTH FROM ds)-1, 3))
ORDER BY ds
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_qtd,
AVG(metric_1d) OVER (
PARTITION BY EXTRACT(YEAR FROM ds)
ORDER BY ds
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_ytd,
ds
FROM (
SELECT
... AS metric_1d
...
我不喜欢的是在所有指标查询中重复基本上相同的代码(如果计算了多个指标,有时会重复多次)。
有没有推荐的方法来简化这个,也许使用某种宏或 UDF?
我在这里没有看到任何有关宏(而不是使用会使代码进一步复杂化的脚本)或 udf 的帮助。相反,我可以推荐使用 WINDOW
子句——这将解决两个方面的问题:提高代码的可读性和消除代码冗余,以防在相同的 windows
上使用多个 metric/analytics 计算
所以,我会re-write你的代码如下
select ds,
avg(metric_1d) over last_7d as metric_7d,
avg(metric_1d) over last_14d as metric_14d,
avg(metric_1d) over last_28d as metric_28d,
avg(metric_1d) over qtd as metric_qtd,
avg(metric_1d) over ytd as metric_ytd,
from your_table
window
last_7d as (order by ds rows between 6 preceding and current row),
last_14d as (order by ds rows between 13 preceding and current row),
last_28d as (order by ds rows between 27 preceding and current row),
qtd as (
partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
order by ds rows between unbounded preceding and current row
),
ytd as (partition by extract(year from ds)
order by ds rows between unbounded preceding and current row
)
如果您要添加更多指标,例如 sum 或 count - 它会像下面一样简单
select ds,
avg(metric_1d) over last_7d as metric_7d,
sum(metric_1d) over last_7d as metric2_7d,
count(metric_1d) over last_7d as metric3_7d,
avg(metric_1d) over last_14d as metric_14d,
sum(metric_1d) over last_14d as metric2_14d,
count(metric_1d) over last_14d as metric3_14d,
avg(metric_1d) over last_28d as metric_28d,
sum(metric_1d) over last_28d as metric2_28d,
count(metric_1d) over last_28d as metric3_28d,
avg(metric_1d) over qtd as metric_qtd,
sum(metric_1d) over qtd as metric2_qtd,
count(metric_1d) over qtd as metric3_qtd,
avg(metric_1d) over ytd as metric_ytd,
sum(metric_1d) over ytd as metric2_ytd,
count(metric_1d) over ytd as metric3_ytd,
from your_table
window
last_7d as (order by ds rows between 6 preceding and current row),
last_14d as (order by ds rows between 13 preceding and current row),
last_28d as (order by ds rows between 27 preceding and current row),
qtd as (
partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
order by ds rows between unbounded preceding and current row
),
ytd as (partition by extract(year from ds)
order by ds rows between unbounded preceding and current row
)
我们使用 Bigquery 来计算许多日常指标,但我们也始终对长期平均值(7 天、14 天、28 天、QTD、YTD)感兴趣。
这总是这样完成的 (ds: date):
AVG(metric_1d) OVER (
ORDER BY ds
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS metric_7d,
AVG(metric_1d) OVER (
ORDER BY ds
ROWS BETWEEN 13 PRECEDING AND CURRENT ROW
) AS metric_14d,
AVG(metric_1d) OVER (
ORDER BY ds
ROWS BETWEEN 27 PRECEDING AND CURRENT ROW
) AS metric_28d,
AVG(metric_1d) OVER (
PARTITION BY CONCAT(EXTRACT(YEAR FROM ds), DIV(EXTRACT(MONTH FROM ds)-1, 3))
ORDER BY ds
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_qtd,
AVG(metric_1d) OVER (
PARTITION BY EXTRACT(YEAR FROM ds)
ORDER BY ds
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_ytd,
ds
FROM (
SELECT
... AS metric_1d
...
我不喜欢的是在所有指标查询中重复基本上相同的代码(如果计算了多个指标,有时会重复多次)。 有没有推荐的方法来简化这个,也许使用某种宏或 UDF?
我在这里没有看到任何有关宏(而不是使用会使代码进一步复杂化的脚本)或 udf 的帮助。相反,我可以推荐使用 WINDOW
子句——这将解决两个方面的问题:提高代码的可读性和消除代码冗余,以防在相同的 windows
所以,我会re-write你的代码如下
select ds,
avg(metric_1d) over last_7d as metric_7d,
avg(metric_1d) over last_14d as metric_14d,
avg(metric_1d) over last_28d as metric_28d,
avg(metric_1d) over qtd as metric_qtd,
avg(metric_1d) over ytd as metric_ytd,
from your_table
window
last_7d as (order by ds rows between 6 preceding and current row),
last_14d as (order by ds rows between 13 preceding and current row),
last_28d as (order by ds rows between 27 preceding and current row),
qtd as (
partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
order by ds rows between unbounded preceding and current row
),
ytd as (partition by extract(year from ds)
order by ds rows between unbounded preceding and current row
)
如果您要添加更多指标,例如 sum 或 count - 它会像下面一样简单
select ds,
avg(metric_1d) over last_7d as metric_7d,
sum(metric_1d) over last_7d as metric2_7d,
count(metric_1d) over last_7d as metric3_7d,
avg(metric_1d) over last_14d as metric_14d,
sum(metric_1d) over last_14d as metric2_14d,
count(metric_1d) over last_14d as metric3_14d,
avg(metric_1d) over last_28d as metric_28d,
sum(metric_1d) over last_28d as metric2_28d,
count(metric_1d) over last_28d as metric3_28d,
avg(metric_1d) over qtd as metric_qtd,
sum(metric_1d) over qtd as metric2_qtd,
count(metric_1d) over qtd as metric3_qtd,
avg(metric_1d) over ytd as metric_ytd,
sum(metric_1d) over ytd as metric2_ytd,
count(metric_1d) over ytd as metric3_ytd,
from your_table
window
last_7d as (order by ds rows between 6 preceding and current row),
last_14d as (order by ds rows between 13 preceding and current row),
last_28d as (order by ds rows between 27 preceding and current row),
qtd as (
partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
order by ds rows between unbounded preceding and current row
),
ytd as (partition by extract(year from ds)
order by ds rows between unbounded preceding and current row
)