如何在不展平的情况下在 bigquery 中使用 Records、Struct 和 Array

Question

我在大查询中有一个非常嵌套的 table。这只是其中的一部分：

[
      {
        "mode": "NULLABLE",
        "name": "id",
        "type": "INTEGER"
      },
    
      {
        "fields": [
          {
            "fields": [
              {
                "mode": "NULLABLE",
                "name": "Type",
                "type": "STRING"
              },
              {
                "mode": "NULLABLE",
                "name": "Term",
                "type": "STRING"
              },
              {
                "fields": [
                  {
                    "mode": "NULLABLE",
                    "name": "TenderID",
                    "type": "STRING"
                  },
                  {
                    "mode": "NULLABLE",
                    "name": "CardType",
                    "type": "STRING"
                ],
                "mode": "REPEATED",
                "name": "EftInfo",
                "type": "RECORD"
              },
            ],
            "mode": "REPEATED",
            "name": "Trx",
            "type": "RECORD"
          }
        ],
        "mode": "NULLABLE",
        "name": "transactions",
        "type": "RECORD"
      }
    ]

我只想在每个字段中转换数据，但保留结构。我带来的是这个，但它使我的 table 变平，或者更确切地说，RECORDS 变得可以为空而不是重复，我不知道如何将它们重新排列：

select 
"id",
ARRAY( SELECT AS STRUCT CAST(Trx.Type AS  INT64) Type,
CAST(Trx.Term AS  INT64) Term,
CAST(Trx.TrxNum AS  INT64) TrxNum,
ARRAY(SELECT AS STRUCT CAST(EftInfo.TenderID AS INT64 ) TenderID,
CAST(EftInfo.CardType AS INT64 ) CardType
)EftInfo) Trx ) transactions
from 
   dataset.table, UNNEST(transactions.Trx)Trx, UNNEST(Trx.EftInfo) EftInfo

Answer 1

因此，在尝试了上述所有组合之后，我得出了正确的结果，我可以解释一下。希望，这会为某人节省时间：

SELECT id,
---here just (), because this is structure which cannot have multiple values
(SELECT AS STRUCT 
CAST(transactions.Location AS  INT64) Location,
CAST(transactions.StoreID AS  INT64) StoreID,
---Here array, because this structure can have more entries
ARRAY(SELECT AS STRUCT 
CAST(TenderID AS INT64 ) TenderID,
CAST(CardType AS INT64 ) CardType
FROM UNNEST(Trx.EftInfo) EftInfo)  EftInfo)
FROM UNNEST(transactions.Trx)Trx)Trx
) transactions
from 
   my_ds.my_table

通过这种方式，人们可以深入到 bq 允许构建其结构的深度。 json 是一种控制正在发生的事情的好方法，可以在查询执行后看到。

如何在不展平的情况下在 bigquery 中使用 Records、Struct 和 Array

How to work with Records, Struct and Array in bigquery without flattening them

sql

arrays

struct

record

google-bigquery