扁平化 BigQuery 中的数据
Flatten the Data in BigQuery
我有 dimensions.key_value
记录类型 i 运行 以下查询和以下输出。
SELECT * from table;
event_id value dimensions
1 140 {"key_value": [{"key": "app", "value": "20"}]}
2 150 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "1"}]}
3 600 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "2"}]}
为了取消嵌套数据,我创建了以下视图:
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'app') as app,
value,
event_id
from table t
) select *
from temp;
我的输出:
region loc app count event_id
null null 20 140 1
8 1 null 150 2
8 2 null. 600. 3
有两件事我需要验证我的查询是否正确?
我如何使查询通用,即如果我不知道所有 key
,我们的数据集中可能还存在其他一些键?
更新:
我的架构:
我的输出:
问题:假设用户想通过使用 region
和 loc
进行分组,所以没有简单的方法来编写查询,我决定创建一个视图,以便用户可以轻松地执行分组依据
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'store') as store,
value,
metric_name, event_time
from table t
) select *
from temp;
基于此视图,用户可以轻松地进行分组。所以我想检查他们是否是创建通用视图的方法,因为我们不知道所有唯一的 key
或者是否有一种简单的方法来进行 groupby。
How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?
考虑以下
execute immediate (select
''' select event_id, value, ''' || string_agg('''
(select value from b.key_value where key = "''' || key_name || '''") as ''' || key_name , ''', ''')
|| '''
from (
select event_id, value,
array(
select as struct
json_extract_scalar(kv, '$.key') key,
json_extract_scalar(kv, '$.value') value
from a.kvs kv
) key_value
from `project.dataset.table`,
unnest([struct(json_extract_array(dimensions, '$.key_value') as kvs)]) a
) b
'''
from (
select distinct json_extract_scalar(kv, '$.key') key_name
from `project.dataset.table`,
unnest(json_extract_array(dimensions, '$.key_value')) as kv
)
)
如果应用于您问题中的示例数据 - ooutput 是
如您在查询中所见 - 没有任何对实际键名的显式引用 - 它们是动态提取的 - 因此无需提前知道它们,它们也可以有任意数量
我有 dimensions.key_value
记录类型 i 运行 以下查询和以下输出。
SELECT * from table;
event_id value dimensions
1 140 {"key_value": [{"key": "app", "value": "20"}]}
2 150 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "1"}]}
3 600 {"key_value": [{"key": "region", "value": "8"}, {"key": "loc", "value": "2"}]}
为了取消嵌套数据,我创建了以下视图:
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'app') as app,
value,
event_id
from table t
) select *
from temp;
我的输出:
region loc app count event_id
null null 20 140 1
8 1 null 150 2
8 2 null. 600. 3
有两件事我需要验证我的查询是否正确?
我如何使查询通用,即如果我不知道所有 key
,我们的数据集中可能还存在其他一些键?
更新:
我的架构:
我的输出:
问题:假设用户想通过使用 region
和 loc
进行分组,所以没有简单的方法来编写查询,我决定创建一个视图,以便用户可以轻松地执行分组依据
with temp as (
select
(select value from t.dimensions.key_value where key = 'region') as region,
(select value from t.dimensions.key_value where key = 'loc') as loc,
(select value from t.dimensions.key_value where key = 'store') as store,
value,
metric_name, event_time
from table t
) select *
from temp;
基于此视图,用户可以轻松地进行分组。所以我想检查他们是否是创建通用视图的方法,因为我们不知道所有唯一的 key
或者是否有一种简单的方法来进行 groupby。
How i can make the query generic i.e if i don't know all the key, there some other keys may also be present in our dataset ?
考虑以下
execute immediate (select
''' select event_id, value, ''' || string_agg('''
(select value from b.key_value where key = "''' || key_name || '''") as ''' || key_name , ''', ''')
|| '''
from (
select event_id, value,
array(
select as struct
json_extract_scalar(kv, '$.key') key,
json_extract_scalar(kv, '$.value') value
from a.kvs kv
) key_value
from `project.dataset.table`,
unnest([struct(json_extract_array(dimensions, '$.key_value') as kvs)]) a
) b
'''
from (
select distinct json_extract_scalar(kv, '$.key') key_name
from `project.dataset.table`,
unnest(json_extract_array(dimensions, '$.key_value')) as kv
)
)
如果应用于您问题中的示例数据 - ooutput 是
如您在查询中所见 - 没有任何对实际键名的显式引用 - 它们是动态提取的 - 因此无需提前知道它们,它们也可以有任意数量