将 JSON 加载到 Google BigQuery 的架构

Question

假设我有以下JSON，这是从日志文件中解析urls参数的结果。

{
    "title": "History of Alphabet",
    "author": [
        {
            "name": "Larry"
        },
    ]
}

{
    "title": "History of ABC",
}

{
    "number_pages": "321",
    "year": "1999",
}

{
    "title": "History of XYZ",
    "author": [
        {
            "name": "Steve",
            "age": "63"
        },
        {
            "nickname": "Bill",
            "dob": "1955-03-29"
        }
    ]
}

顶层的所有字段，"title"、"author"、"number_pages"、"year"都是可选的。第二层的字段也是如此，例如 "author" 内。

加载到 BQ 时，我应该如何为这个 JSON 创建架构？

相关问题：例如，假设有另一个类似的 table，但数据来自不同的日期，因此可能有不同的架构。是否可以跨这 2 table 查询？

Answer 1

How should I make a schema for this JSON when loading it to BQ?

以下架构应该有效。您可能想要更改某些类型（例如，您可能希望 dob 字段是 TIMESTAMP 而不是 STRING），但总体结构应该是相似的。由于类型默认为 NULLABLE，因此所有这些字段都应处理给定行不存在的问题。

[
    {
        "name": "title",
        "type": "STRING"
    },
    {
        "name": "author",
        "type": "RECORD",
        "fields": [
            {
                "name": "name",
                "type": "STRING"
            },
            {
                "name": "age",
                "type": "STRING"
            },
            {
                "name": "nickname",
                "type": "STRING"
            },
            {
                "name": "dob",
                "type": "STRING"
            }
        ]
    },
    {
        "name": "number_pages",
        "type": "INTEGER"
    },
    {
        "name": "year",
        "type": "INTEGER"
    }
]

A related question: For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?

应该可以毫不费力地联合两个具有不同架构的 table。

这是一个简单的例子，说明它如何处理 public 数据（有点愚蠢的例子，因为 table 包含共同的零字段，但显示了概念）：

SELECT * FROM 
    (SELECT * FROM publicdata:samples.natality), 
    (SELECT * FROM publicdata:samples.shakespeare) 
LIMIT 100;

请注意，每个 table 周围都需要 SELECT *，否则查询会抱怨不同的架构。

将 JSON 加载到 Google BigQuery 的架构

Schema to load JSON to Google BigQuery

google-bigquery