如何从 Snowflake 中的 variant/object 中提取所有值?

How to extract all values from variant/object in Snowflake?

一个变体列包含以下数据:

[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  }
]

其中数字是值。如果我想将所有键提取为一个数组,可以使用 OBJECT_KEYS 函数。但是我怎样才能提取所有值来获得这个输出呢?

[
  "1",
  "2",
  "3"
]

补充说明。键以及映射到它们的值始终相同。更详细的例子。 输入 3 records/rows:

[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  }
]
[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  }
]
[
  {
    "a": "1",
    "c": "3"
  }
]

输出应该是:

{"1", "2", "3"}
{"1", "2", "3"}
{"1", "3"}

根据您提供的数据描述的固定程度,有几种方法可以做到这一点?

如果根据您的示例,您的外部数组中只有一个元素,或者如果有多个元素,您只需要第一个元素,您可以先使用 mycol[0] 从数组中提取元素,然后横向展平阵列,array_agg 来自横向展平

VALUEs
-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
-- Query
Select array_agg(first_inner_array.value) result
from data, 
     lateral flatten(input => mycol[0]) first_inner_array
--Group By first_inner_array.index
;

如果数组中有多个元素需要提取,您可以使用两个横向扁平化。

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
-- Query
Select flat_outer_array.index, array_agg(flat_inner_array.value) 
from data, 
     lateral flatten(input => mycol) flat_outer_array,  
     lateral flatten(input => flat_outer_array.value) flat_inner_array
-- Uncomment the where clause below to use this solution to pick the 1st element only
-- Where flat_outer_array.INDEX = 0     
Group By flat_outer_array.index     ;

您还可以创建一个简单的 Javascript 函数来从对象中提取 VALUES,根据 OBJECT_KEYS 从您要查找的对象返回 KEYS 的逆过程。这避免了 flattenarray_agg 操作。

create Or Replace Function OBJECT_VALUES(input_object object)
-- Pluck the Values from an Object and create an ARRAY.
-- Maintains the order of the OBJECT (KEYS) in the ARRAY.
  returns ARRAY
  language JAVASCRIPT
as
$$
// Return the values from the Object as an ARRAY
  return Object.values(INPUT_OBJECT);
$$;

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
Select OBJECT_VALUES(mycol[0]) from data

或者如果您需要 ARRAY 中的多个元素。

-- CTE to create  data
with data as (Select parse_json('[
  {
    "a": "1",
    "b": "2",
    "c": "3"
  },
  {
    "d": "4",
    "e": "5",
    "f": "6"
  }
]'  ) myCol)
-- Query
Select flat_outer_array.index, OBJECT_VALUES(value) 
from data, 
     lateral flatten(input => mycol) flat_outer_array
Group By 1,2;

这些选项的性能可能因数据规模和数据形状而异,因此您可能想尝试不同的选项。

每个数组一个对象:

假设数据每个数组只有一个对象:

With data as (
    select parse_json(column1) as json
    from values
    ('[{"a": "1","b": "2","c": "3"}]'),
    ('[{"a": "1","b": "2","c": "3"}]'),
    ('[{"a": "1","c": "3"}]')
)
select 
    '{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data, table(flatten(json[0]))v
group by v.seq
order by v.seq

给出:

OUTPUT
{"1","2","3"}
{"1","2","3"}
{"1","3"}

每个数组多个对象,合并:

With data as (
    select parse_json(column1) as json
    from values
    ('[{"a": "1","b": "2","c": "3"},{"a": "1","d": "4","e": "5"}]'),
    ('[{"a": "1","b": "2","c": "3"}]'),
    ('[{"a": "1","c": "3"}]')
)
select 
    '{'|| listagg(distinct '"'||v.value||'"', ',') within group (order by '"'||v.value||'"')|| '}' as output
from data
    ,table(flatten(json))a
    ,table(flatten(a.value))v
group by a.seq
order by a.seq

给出:

OUTPUT
{"1","2","3","4","5"}
{"1","2","3"}
{"1","3"}