Bigquery macros/repeated 查询部分

Bigquery macros/repeated query parts

我们使用 Bigquery 来计算许多日常指标,但我们也始终对长期平均值(7 天、14 天、28 天、QTD、YTD)感兴趣。

这总是这样完成的 (ds: date):

AVG(metric_1d) OVER ( 
  ORDER BY ds 
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW 
) AS metric_7d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 13 PRECEDING AND CURRENT ROW 
) AS metric_14d,
AVG(metric_1d) OVER (
  ORDER BY ds 
  ROWS BETWEEN 27 PRECEDING AND CURRENT ROW 
) AS metric_28d,
AVG(metric_1d) OVER (
  PARTITION BY CONCAT(EXTRACT(YEAR FROM ds), DIV(EXTRACT(MONTH FROM ds)-1, 3))
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_qtd,
AVG(metric_1d) OVER (
  PARTITION BY EXTRACT(YEAR FROM ds)
  ORDER BY ds
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS metric_ytd,
ds
FROM (
  SELECT
    ... AS metric_1d
    ...

我不喜欢的是在所有指标查询中重复基本上相同的代码(如果计算了多个指标,有时会重复多次)。 有没有推荐的方法来简化这个,也许使用某种宏或 UDF?

我在这里没有看到任何有关宏(而不是使用会使代码进一步复杂化的脚本)或 udf 的帮助。相反,我可以推荐使用 WINDOW 子句——这将解决两个方面的问题:提高代码的可读性和消除代码冗余,以防在相同的 windows

上使用多个 metric/analytics 计算

所以,我会re-write你的代码如下

select ds, 
  avg(metric_1d) over last_7d as metric_7d,
  avg(metric_1d) over last_14d as metric_14d,
  avg(metric_1d) over last_28d as metric_28d,
  avg(metric_1d) over qtd as metric_qtd,
  avg(metric_1d) over ytd as metric_ytd,
from your_table
window 
  last_7d  as (order by ds rows between  6 preceding and current row),
  last_14d as (order by ds rows between 13 preceding and current row),
  last_28d as (order by ds rows between 27 preceding and current row),
  qtd as (
    partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
    order by ds rows between unbounded preceding and current row
  ),
  ytd as (partition by extract(year from ds)
    order by ds rows between unbounded preceding and current row
  )         

如果您要添加更多指标,例如 sum 或 count - 它会像下面一样简单

select ds, 
  avg(metric_1d) over last_7d as metric_7d,
  sum(metric_1d) over last_7d as metric2_7d,
  count(metric_1d) over last_7d as metric3_7d,
  avg(metric_1d) over last_14d as metric_14d,
  sum(metric_1d) over last_14d as metric2_14d,
  count(metric_1d) over last_14d as metric3_14d,
  avg(metric_1d) over last_28d as metric_28d,
  sum(metric_1d) over last_28d as metric2_28d,
  count(metric_1d) over last_28d as metric3_28d,
  avg(metric_1d) over qtd as metric_qtd,
  sum(metric_1d) over qtd as metric2_qtd,
  count(metric_1d) over qtd as metric3_qtd,
  avg(metric_1d) over ytd as metric_ytd,
  sum(metric_1d) over ytd as metric2_ytd,
  count(metric_1d) over ytd as metric3_ytd,
from your_table
window 
  last_7d  as (order by ds rows between  6 preceding and current row),
  last_14d as (order by ds rows between 13 preceding and current row),
  last_28d as (order by ds rows between 27 preceding and current row),
  qtd as (
    partition by concat(extract(year from ds), div(extract(month from ds)-1, 3))
    order by ds rows between unbounded preceding and current row
  ),
  ytd as (partition by extract(year from ds)
    order by ds rows between unbounded preceding and current row
  )