将嵌套数组从 public google 云数据集加载到 bigquery
Loading nested array into bigquery from public google cloud dataset
我正在尝试将 public 数据集从 Google Cloud 加载到 BigQuery (quickdraw_dataset)。数据为 JSON 格式,如下所示:
{
"key_id":"5891796615823360",
"word":"nose",
"countrycode":"AE",
"timestamp":"2017-03-01 20:41:36.70725 UTC",
"recognized":true,
"drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]]
}
我 运行 遇到的问题是 "drawing" 字段是一个嵌套数组。我从阅读其他帖子了解到您无法将数组读入 BigQuery? 建议解决此问题的一种方法是将数组作为字符串读入。但是,当我使用以下架构时,出现此错误:
`
[
{
"name": "key_id",
"type": "STRING"
},
{
"name": "word",
"type": "STRING"
},
{
"name": "countrycode",
"type": "STRING"
},
{
"name": "timestamp",
"type": "STRING"
},
{
"name": "recognized",
"type": "BOOLEAN"
},
{
"name": "drawing",
"type": "STRING"
}
]
读取数据时出错,错误消息:JSON从位置 0 开始的行中的解析错误:为非重复字段指定的数组:绘图。
有没有办法将此数据集读入 BigQuery?
提前致谢!
将整行加载为 CSV,然后在 BigQuery 中解析。
加载:
bq load --F \t temp.eraser gs://quickdraw_dataset/full/simplified/eraser.ndjson row
查询:
SELECT JSON_EXTRACT_SCALAR(row, '$.countrycode') a
, JSON_EXTRACT_SCALAR(row, '$.word') b
, JSON_EXTRACT_ARRAY(row, '$.drawing')[OFFSET(0)] c
FROM temp.eraser
我正在尝试将 public 数据集从 Google Cloud 加载到 BigQuery (quickdraw_dataset)。数据为 JSON 格式,如下所示:
{
"key_id":"5891796615823360",
"word":"nose",
"countrycode":"AE",
"timestamp":"2017-03-01 20:41:36.70725 UTC",
"recognized":true,
"drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]]
}
我 运行 遇到的问题是 "drawing" 字段是一个嵌套数组。我从阅读其他帖子了解到您无法将数组读入 BigQuery?
[
{
"name": "key_id",
"type": "STRING"
},
{
"name": "word",
"type": "STRING"
},
{
"name": "countrycode",
"type": "STRING"
},
{
"name": "timestamp",
"type": "STRING"
},
{
"name": "recognized",
"type": "BOOLEAN"
},
{
"name": "drawing",
"type": "STRING"
}
]
读取数据时出错,错误消息:JSON从位置 0 开始的行中的解析错误:为非重复字段指定的数组:绘图。
有没有办法将此数据集读入 BigQuery?
提前致谢!
将整行加载为 CSV,然后在 BigQuery 中解析。
加载:
bq load --F \t temp.eraser gs://quickdraw_dataset/full/simplified/eraser.ndjson row
查询:
SELECT JSON_EXTRACT_SCALAR(row, '$.countrycode') a
, JSON_EXTRACT_SCALAR(row, '$.word') b
, JSON_EXTRACT_ARRAY(row, '$.drawing')[OFFSET(0)] c
FROM temp.eraser