Bigquery - 重复记录中的重复字段
Bigquery - Repeated field inside of a repeated record
我有一个非规范化的 table,我想使用 BigQuery 中的重复字段来简化它。
为了说明我正在尝试做的事情,我想从:
|CustomerNumber|InvoiceNumber|InvoiceLineItem|
|--------------|-------------|---------------|
|78278278 |8765 |VV190 |
|78278278 |8765 |VV191 |
|78278278 |9321 |VV198 |
|78278278 |9321 |VV199 |
收件人:
|CustomerNumber|InvoiceNumber [REPEATED]|InvoiceLineItem [REPEATED]|
|--------------|------------------------|--------------------------|
|78278278 |8765 |VV190 |
| | |VV191 |
| |------------------------|--------------------------|
| |9321 |VV198 |
| | |VV199 |
我能够在 BigQuery 中创建这种类型的模式,但无法编写 SQL 查询以从我的非规范化数据转到我想要的 table 模式。
[
{
"name": "CustNumber",
"type": "STRING"
},
{
"fields": [
{
"name": "InvoiceNumber",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "InvoiceLineItem",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "OrderInfo",
"type": "RECORD"
}
]
Bigquery 控制台中显示的架构屏幕截图:
如有任何帮助,我们将不胜感激。
谢谢 :D
其他资源 -
开始播放的样本:
WITH
DATA AS (
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV190" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV191" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV198" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV199" AS InvoiceLineItem )
试试这个
WITH
DATA AS (
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV190" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV191" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV198" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV199" AS InvoiceLineItem )
SELECT CustomerNumber, ARRAY_AGG(STRUCT (InvoiceNumber,InvoiceLineItem)) OrderInfo FROM (
SELECT CustomerNumber, InvoiceNumber, ARRAY_AGG(InvoiceLineItem) InvoiceLineItem FROM DATA
GROUP BY CustomerNumber, InvoiceNumber
)
GROUP BY CustomerNumber
我有一个非规范化的 table,我想使用 BigQuery 中的重复字段来简化它。 为了说明我正在尝试做的事情,我想从:
|CustomerNumber|InvoiceNumber|InvoiceLineItem|
|--------------|-------------|---------------|
|78278278 |8765 |VV190 |
|78278278 |8765 |VV191 |
|78278278 |9321 |VV198 |
|78278278 |9321 |VV199 |
收件人:
|CustomerNumber|InvoiceNumber [REPEATED]|InvoiceLineItem [REPEATED]|
|--------------|------------------------|--------------------------|
|78278278 |8765 |VV190 |
| | |VV191 |
| |------------------------|--------------------------|
| |9321 |VV198 |
| | |VV199 |
我能够在 BigQuery 中创建这种类型的模式,但无法编写 SQL 查询以从我的非规范化数据转到我想要的 table 模式。
[
{
"name": "CustNumber",
"type": "STRING"
},
{
"fields": [
{
"name": "InvoiceNumber",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "InvoiceLineItem",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "OrderInfo",
"type": "RECORD"
}
]
Bigquery 控制台中显示的架构屏幕截图:
如有任何帮助,我们将不胜感激。 谢谢 :D
其他资源 - 开始播放的样本:
WITH
DATA AS (
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV190" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV191" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV198" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV199" AS InvoiceLineItem )
试试这个
WITH
DATA AS (
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV190" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV191" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV198" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV199" AS InvoiceLineItem )
SELECT CustomerNumber, ARRAY_AGG(STRUCT (InvoiceNumber,InvoiceLineItem)) OrderInfo FROM (
SELECT CustomerNumber, InvoiceNumber, ARRAY_AGG(InvoiceLineItem) InvoiceLineItem FROM DATA
GROUP BY CustomerNumber, InvoiceNumber
)
GROUP BY CustomerNumber