扁平化 BigQuery 中的数据

Flatten the Data in BigQuery

我有 dimensions.key_value 记录类型 i 运行 以下查询和以下输出。

SELECT *  from table;

   event_id   value     dimensions
     1        140      {"key_value": [{"key": "app", "value": "20"}]}
     2        150      {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "1"}]}
     3        600      {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "2"}]}
   

为了取消嵌套数据,我创建了以下视图:

with temp as (
  select 
    (select value from t.dimensions.key_value where key = 'region') as region,
    (select value from t.dimensions.key_value where key = 'loc') as loc,
    (select value from t.dimensions.key_value where key = 'app') as app,
    value,
    event_id
    
  from table t
) select *
from temp;

我的输出:

region   loc       app   count  event_id
null    null       20     140      1   
 8       1         null   150      2
 8       2         null.  600.     3

有两件事我需要验证我的查询是否正确? 我如何使查询通用,即如果我不知道所有 key,我们的数据集中可能还存在其他一些键?

更新:

我的架构:

我的输出:

问题:假设用户想通过使用 regionloc 进行分组,所以没有简单的方法来编写查询,我决定创建一个视图,以便用户可以轻松地执行分组依据

 with temp as (
      select 
        (select value from t.dimensions.key_value where key = 'region') as region,
        (select value from t.dimensions.key_value where key = 'loc') as loc,
        (select value from t.dimensions.key_value where key = 'store') as store,
        value,
        metric_name, event_time
        
      from table t
    ) select *
    from temp;

基于此视图,用户可以轻松地进行分组。所以我想检查他们是否是创建通用视图的方法,因为我们不知道所有唯一的 key 或者是否有一种简单的方法来进行 groupby。

How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?

考虑以下

execute immediate (select
  ''' select event_id, value, ''' || string_agg('''
    (select value from b.key_value where key = "''' || key_name || '''") as ''' || key_name , ''', ''')
  || '''
  from (
    select event_id, value,
      array(
        select as struct 
          json_extract_scalar(kv, '$.key') key, 
          json_extract_scalar(kv, '$.value') value
        from a.kvs kv
      ) key_value
    from `project.dataset.table`,
    unnest([struct(json_extract_array(dimensions, '$.key_value') as kvs)]) a
  ) b
  '''
  from (
    select distinct json_extract_scalar(kv, '$.key') key_name
    from `project.dataset.table`,
    unnest(json_extract_array(dimensions, '$.key_value')) as kv
  )
)    

如果应用于您问题中的示例数据 - ooutput 是

如您在查询中所见 - 没有任何对实际键名的显式引用 - 它们是动态提取的 - 因此无需提前知道它们,它们也可以有任意数量