如何从 Snowflake 中的 variant/object 中提取所有值?
How to extract all values from variant/object in Snowflake?
一个变体列包含以下数据:
[
{
"a": "1",
"b": "2",
"c": "3"
}
]
其中数字是值。如果我想将所有键提取为一个数组,可以使用 OBJECT_KEYS 函数。但是我怎样才能提取所有值来获得这个输出呢?
[
"1",
"2",
"3"
]
补充说明。键以及映射到它们的值始终相同。更详细的例子。
输入 3 records/rows:
[
{
"a": "1",
"b": "2",
"c": "3"
}
]
[
{
"a": "1",
"b": "2",
"c": "3"
}
]
[
{
"a": "1",
"c": "3"
}
]
输出应该是:
{"1", "2", "3"}
{"1", "2", "3"}
{"1", "3"}
根据您提供的数据描述的固定程度,有几种方法可以做到这一点?
如果根据您的示例,您的外部数组中只有一个元素,或者如果有多个元素,您只需要第一个元素,您可以先使用 mycol[0]
从数组中提取元素,然后横向展平阵列,array_agg 来自横向展平
的 VALUE
s
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
-- Query
Select array_agg(first_inner_array.value) result
from data,
lateral flatten(input => mycol[0]) first_inner_array
--Group By first_inner_array.index
;
如果数组中有多个元素需要提取,您可以使用两个横向扁平化。
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
-- Query
Select flat_outer_array.index, array_agg(flat_inner_array.value)
from data,
lateral flatten(input => mycol) flat_outer_array,
lateral flatten(input => flat_outer_array.value) flat_inner_array
-- Uncomment the where clause below to use this solution to pick the 1st element only
-- Where flat_outer_array.INDEX = 0
Group By flat_outer_array.index ;
您还可以创建一个简单的 Javascript 函数来从对象中提取 VALUES,根据 OBJECT_KEYS 从您要查找的对象返回 KEYS 的逆过程。这避免了 flatten
和 array_agg
操作。
create Or Replace Function OBJECT_VALUES(input_object object)
-- Pluck the Values from an Object and create an ARRAY.
-- Maintains the order of the OBJECT (KEYS) in the ARRAY.
returns ARRAY
language JAVASCRIPT
as
$$
// Return the values from the Object as an ARRAY
return Object.values(INPUT_OBJECT);
$$;
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
Select OBJECT_VALUES(mycol[0]) from data
或者如果您需要 ARRAY 中的多个元素。
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
-- Query
Select flat_outer_array.index, OBJECT_VALUES(value)
from data,
lateral flatten(input => mycol) flat_outer_array
Group By 1,2;
这些选项的性能可能因数据规模和数据形状而异,因此您可能想尝试不同的选项。
每个数组一个对象:
假设数据每个数组只有一个对象:
With data as (
select parse_json(column1) as json
from values
('[{"a": "1","b": "2","c": "3"}]'),
('[{"a": "1","b": "2","c": "3"}]'),
('[{"a": "1","c": "3"}]')
)
select
'{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data, table(flatten(json[0]))v
group by v.seq
order by v.seq
给出:
OUTPUT
{"1","2","3"}
{"1","2","3"}
{"1","3"}
每个数组多个对象,合并:
With data as (
select parse_json(column1) as json
from values
('[{"a": "1","b": "2","c": "3"},{"a": "1","d": "4","e": "5"}]'),
('[{"a": "1","b": "2","c": "3"}]'),
('[{"a": "1","c": "3"}]')
)
select
'{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data
,table(flatten(json))a
,table(flatten(a.value))v
group by a.seq
order by a.seq
给出:
OUTPUT
{"1","2","3","4","5"}
{"1","2","3"}
{"1","3"}
一个变体列包含以下数据:
[
{
"a": "1",
"b": "2",
"c": "3"
}
]
其中数字是值。如果我想将所有键提取为一个数组,可以使用 OBJECT_KEYS 函数。但是我怎样才能提取所有值来获得这个输出呢?
[
"1",
"2",
"3"
]
补充说明。键以及映射到它们的值始终相同。更详细的例子。 输入 3 records/rows:
[
{
"a": "1",
"b": "2",
"c": "3"
}
]
[
{
"a": "1",
"b": "2",
"c": "3"
}
]
[
{
"a": "1",
"c": "3"
}
]
输出应该是:
{"1", "2", "3"}
{"1", "2", "3"}
{"1", "3"}
根据您提供的数据描述的固定程度,有几种方法可以做到这一点?
如果根据您的示例,您的外部数组中只有一个元素,或者如果有多个元素,您只需要第一个元素,您可以先使用 mycol[0]
从数组中提取元素,然后横向展平阵列,array_agg 来自横向展平
VALUE
s
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
-- Query
Select array_agg(first_inner_array.value) result
from data,
lateral flatten(input => mycol[0]) first_inner_array
--Group By first_inner_array.index
;
如果数组中有多个元素需要提取,您可以使用两个横向扁平化。
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
-- Query
Select flat_outer_array.index, array_agg(flat_inner_array.value)
from data,
lateral flatten(input => mycol) flat_outer_array,
lateral flatten(input => flat_outer_array.value) flat_inner_array
-- Uncomment the where clause below to use this solution to pick the 1st element only
-- Where flat_outer_array.INDEX = 0
Group By flat_outer_array.index ;
您还可以创建一个简单的 Javascript 函数来从对象中提取 VALUES,根据 OBJECT_KEYS 从您要查找的对象返回 KEYS 的逆过程。这避免了 flatten
和 array_agg
操作。
create Or Replace Function OBJECT_VALUES(input_object object)
-- Pluck the Values from an Object and create an ARRAY.
-- Maintains the order of the OBJECT (KEYS) in the ARRAY.
returns ARRAY
language JAVASCRIPT
as
$$
// Return the values from the Object as an ARRAY
return Object.values(INPUT_OBJECT);
$$;
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
Select OBJECT_VALUES(mycol[0]) from data
或者如果您需要 ARRAY 中的多个元素。
-- CTE to create data
with data as (Select parse_json('[
{
"a": "1",
"b": "2",
"c": "3"
},
{
"d": "4",
"e": "5",
"f": "6"
}
]' ) myCol)
-- Query
Select flat_outer_array.index, OBJECT_VALUES(value)
from data,
lateral flatten(input => mycol) flat_outer_array
Group By 1,2;
这些选项的性能可能因数据规模和数据形状而异,因此您可能想尝试不同的选项。
每个数组一个对象:
假设数据每个数组只有一个对象:
With data as (
select parse_json(column1) as json
from values
('[{"a": "1","b": "2","c": "3"}]'),
('[{"a": "1","b": "2","c": "3"}]'),
('[{"a": "1","c": "3"}]')
)
select
'{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data, table(flatten(json[0]))v
group by v.seq
order by v.seq
给出:
OUTPUT |
---|
{"1","2","3"} |
{"1","2","3"} |
{"1","3"} |
每个数组多个对象,合并:
With data as (
select parse_json(column1) as json
from values
('[{"a": "1","b": "2","c": "3"},{"a": "1","d": "4","e": "5"}]'),
('[{"a": "1","b": "2","c": "3"}]'),
('[{"a": "1","c": "3"}]')
)
select
'{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data
,table(flatten(json))a
,table(flatten(a.value))v
group by a.seq
order by a.seq
给出:
OUTPUT |
---|
{"1","2","3","4","5"} |
{"1","2","3"} |
{"1","3"} |