将 JSON 加载到 Google BigQuery 的架构
Schema to load JSON to Google BigQuery
假设我有以下JSON,这是从日志文件中解析urls参数的结果。
{
"title": "History of Alphabet",
"author": [
{
"name": "Larry"
},
]
}
{
"title": "History of ABC",
}
{
"number_pages": "321",
"year": "1999",
}
{
"title": "History of XYZ",
"author": [
{
"name": "Steve",
"age": "63"
},
{
"nickname": "Bill",
"dob": "1955-03-29"
}
]
}
顶层的所有字段,"title"、"author"、"number_pages"、"year"都是可选的。第二层的字段也是如此,例如 "author" 内。
加载到 BQ 时,我应该如何为这个 JSON 创建架构?
相关问题:
例如,假设有另一个类似的 table,但数据来自不同的日期,因此可能有不同的架构。是否可以跨这 2 table 查询?
How should I make a schema for this JSON when loading it to BQ?
以下架构应该有效。您可能想要更改某些类型(例如,您可能希望 dob 字段是 TIMESTAMP
而不是 STRING
),但总体结构应该是相似的。由于类型默认为 NULLABLE
,因此所有这些字段都应处理给定行不存在的问题。
[
{
"name": "title",
"type": "STRING"
},
{
"name": "author",
"type": "RECORD",
"fields": [
{
"name": "name",
"type": "STRING"
},
{
"name": "age",
"type": "STRING"
},
{
"name": "nickname",
"type": "STRING"
},
{
"name": "dob",
"type": "STRING"
}
]
},
{
"name": "number_pages",
"type": "INTEGER"
},
{
"name": "year",
"type": "INTEGER"
}
]
A related question: For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
应该可以毫不费力地联合两个具有不同架构的 table。
这是一个简单的例子,说明它如何处理 public 数据(有点愚蠢的例子,因为 table 包含共同的零字段,但显示了概念):
SELECT * FROM
(SELECT * FROM publicdata:samples.natality),
(SELECT * FROM publicdata:samples.shakespeare)
LIMIT 100;
请注意,每个 table 周围都需要 SELECT *
,否则查询会抱怨不同的架构。
假设我有以下JSON,这是从日志文件中解析urls参数的结果。
{
"title": "History of Alphabet",
"author": [
{
"name": "Larry"
},
]
}
{
"title": "History of ABC",
}
{
"number_pages": "321",
"year": "1999",
}
{
"title": "History of XYZ",
"author": [
{
"name": "Steve",
"age": "63"
},
{
"nickname": "Bill",
"dob": "1955-03-29"
}
]
}
顶层的所有字段,"title"、"author"、"number_pages"、"year"都是可选的。第二层的字段也是如此,例如 "author" 内。
加载到 BQ 时,我应该如何为这个 JSON 创建架构?
相关问题: 例如,假设有另一个类似的 table,但数据来自不同的日期,因此可能有不同的架构。是否可以跨这 2 table 查询?
How should I make a schema for this JSON when loading it to BQ?
以下架构应该有效。您可能想要更改某些类型(例如,您可能希望 dob 字段是 TIMESTAMP
而不是 STRING
),但总体结构应该是相似的。由于类型默认为 NULLABLE
,因此所有这些字段都应处理给定行不存在的问题。
[
{
"name": "title",
"type": "STRING"
},
{
"name": "author",
"type": "RECORD",
"fields": [
{
"name": "name",
"type": "STRING"
},
{
"name": "age",
"type": "STRING"
},
{
"name": "nickname",
"type": "STRING"
},
{
"name": "dob",
"type": "STRING"
}
]
},
{
"name": "number_pages",
"type": "INTEGER"
},
{
"name": "year",
"type": "INTEGER"
}
]
A related question: For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
应该可以毫不费力地联合两个具有不同架构的 table。
这是一个简单的例子,说明它如何处理 public 数据(有点愚蠢的例子,因为 table 包含共同的零字段,但显示了概念):
SELECT * FROM
(SELECT * FROM publicdata:samples.natality),
(SELECT * FROM publicdata:samples.shakespeare)
LIMIT 100;
请注意,每个 table 周围都需要 SELECT *
,否则查询会抱怨不同的架构。