在 Bigquery 中查找具有硬编码数组的数组元素之间的差异

Find difference between elements of arrays with hardcoded array in Bigquery

我的一个 table 有一列 json,这是一个键值对象数组(大约 40 个)。数组中键的顺序与 table 中的记录相同。也就是说,不同记录的第 i 个元素将具有相同的键。这些对象的值可以是不同的类型:数字、字符串、空值、字符串数组等。
我有一个硬编码的 json 与这个数组具有相同的结构,比如 [{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
我想将此数组与 table 中所有记录的数组列逐个元素进行比较。并且只显示数组中没有匹配值的元素。
也就是说,如果某个记录的数组等于 [{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}],那么在这种情况下,差异将为 [{"key": "key 1", "value": "not equal value 1"}],key2 和 key3 将被跳过,因为它们的值相等。
所以对于这样的数据样本

id | json | ...
----------
1  |[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
----------
2  |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]
----------
3  |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]
----------
4  |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]
----------
5  |[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]

我期待结果

id | json
----------
1  |[{"key": "key 1", "value": "not equal value 1"}]
----------
2  |[{"key": "key 2", "value" : "not equal value 2"}]
----------
3  |[{"key": "key 2", "value" : "not equal value 2"}]
----------
4  |[]
----------
5  |[]

我还想做一个查询,将上面的结果按键分组并计算它们的数量。也就是说,它会明确哪些键值与硬编码数组的差异最大和最小。

key   | count
--------------
key 2 | 2
key 1 | 1

不清楚您实际拥有什么数组 col 或 json/string col - 所以我正在使用您提供的任何数据样本 - 这是 json.

... display only the elements of the array that have no matching values.

with your_table as (
  select 1 id, '[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json union all
  select 2, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
  select 3, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
  select 4, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' union all
  select 5, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' 
), search as (
  select '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json
)
select id,
  array( 
    select t_element 
    from unnest(json_extract_array(t.json)) t_element, search s 
    left join unnest(json_extract_array(s.json)) s_element
    on t_element = s_element
    where s_element is null
  ) arr
from your_table t      

有输出

... I also want to make a query that will group the result above by key and count their number.

with your_table as (
  select 1 id, '[{"key": "key 1", "value": "not equal value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json union all
  select 2, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
  select 3, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "not equal value 2"}, {"key": "key 3", "value": "value 3"}]' union all
  select 4, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' union all
  select 5, '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' 
), search as (
  select '[{"key": "key 1", "value": "value 1"}, {"key": "key 2", "value" : "value 2"}, {"key": "key 3", "value": "value 3"}]' json
)
select key, count(*) counts
from (
  select id,
    array( 
      select json_extract_scalar(t_element, '$.key') 
      from unnest(json_extract_array(t.json)) t_element, search s 
      left join unnest(json_extract_array(s.json)) s_element
      on t_element = s_element
      where s_element is null
    ) keys
  from your_table t
), unnest(keys) key
group by key     

有输出

我使用架构中的结构数组创建了 table,如下所示:

我运行以下查询:

SELECT m.key, m.value FROM `Project.Dataset.Table`, unnest(jsoncolumn) m group by m.key, m.value having count(*)=1)

输出:

您可以按键进行分组以计算对先前查询结果的计数,如下所示:

select key as key, count(*) as count from (SELECT m.key, m.value FROM `Project.Dataset.Table`, unnest(jsoncolumn) m group by m.key, m.value having count(*)=1) group by key

输出: