高效使用 BigQuery 的 APPROX_QUANTILES
using BigQuery's APPROX_QUANTILES efficiently
现在,如果我想获得某个值的十分之一,我会这样做
SELECT
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(10)] as p10,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(20)] as p20,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(30)] as p30,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(40)] as p40,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(50)] as p50,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(60)] as p60,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(70)] as p70,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(80)] as p80,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(90)] as p90,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(100)] as p100
FROM table
我想确保这不是 10 倍大查询的工作,如果有更紧凑的方式来编写这个
如果您 运行 查询然后检查执行计划,您会看到 BigQuery 只计算一次分位数,然后在第二步中提取数组的各种元素。您无需担心尝试自己对 APPROX_QUANTILES
聚合进行重复数据删除。
现在,如果我想获得某个值的十分之一,我会这样做
SELECT
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(10)] as p10,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(20)] as p20,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(30)] as p30,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(40)] as p40,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(50)] as p50,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(60)] as p60,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(70)] as p70,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(80)] as p80,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(90)] as p90,
APPROX_QUANTILES(value, 100)[SAFE_ORDINAL(100)] as p100
FROM table
我想确保这不是 10 倍大查询的工作,如果有更紧凑的方式来编写这个
如果您 运行 查询然后检查执行计划,您会看到 BigQuery 只计算一次分位数,然后在第二步中提取数组的各种元素。您无需担心尝试自己对 APPROX_QUANTILES
聚合进行重复数据删除。