如何使用 BigQuery 中的数组从 table 中删除重复行

Question

BigQuery 中有一个 table 具有 REPEATED 类型的列和重复的行，因为 table 有数组我不能使用 distinct 来只抓取一行。

Table 看起来像这样：

我想删除重复的行，输出应该是这样的：

我没有找到得出上述结果的方法，有人可以帮忙吗？

Answer 1

用自动生成的数字添加 1 列 XYZ 或自己在该列中创建编号。每行 1 个唯一编号。

然后对您的数据进行分组查询，并为每个组“select max(columnXYZ) as RowsToDelete”，这将 select 仅在您的数据中最后重复。然后通过这些 RowsToDelete 进行删除。

Answer 2

考虑以下方法

select *
from your_table t
where true
qualify 1 = row_number() over(partition by format('%t', t))

Answer 3

我使用示例数据。在这种情况下，id 号 1 是重复的。您可以使用此查询。

 WITH data AS (
  SELECT 1 id, ["a", "a", "b"] strings,5 strings2
  UNION ALL
  SELECT 1 id, ["a", "a", "b"] strings,5 strings2
  UNION ALL
  SELECT 3 id, ["z", "a", "b"] strings,3 strings2
)


SELECT id, ARRAY_AGG(DISTINCT string) strings, strings2
FROM data, UNNEST(strings) string
GROUP BY id,strings2

这是我的结果。

如何使用 BigQuery 中的数组从 table 中删除重复行

How to remove duplicated rows from table with arrays in BigQuery

sql

google-bigquery

google-cloud-platform