使用 BigQuery 查询 BigTable 数据:如何制作正确的 table 定义文件
Querying BigTable data with BigQuery: how to make a correct table definition file
我尝试为 BigQuery 设置外部 tables,源是 BigTable。我可以创建 table 但在执行 sql 语句时出现此错误:
Error while reading data, error message: Error detected while parsing row starting at position: 6.
Error: Data between close double quote (") and field separator.
很可能 table 定义文件是错误的。我花了最后一天修修补补,但我没有找到问题所在。有人可以指出我犯的错误吗?
谢谢
我的工作流程:
- 创建定义文件
- 使用命令:bq mk --external_table_definition=gs://realtimecrypto-ostabprj/def.json test.test1
- 在 bigquery 中:SELECT * FROM
ostabprj.test.test1
LIMIT 1000
我的table定义文件:
我也尝试了它的不同变体。
{
"sourceFormat": "BIGTABLE",
"sourceUris": [
"https://googleapis.com/bigtable/projects/ostabprj/instances/cryptorealtime/tables/cryptorealtime"
],
"bigtableOptions": {
"columnFamilies" : [
{
"familyId": "market",
"type": "STRING",
"encoding": "TEXT"
}
]
}
}
Bigtable 中的数据如下所示:
(来自本教程:https://www.cloudskillsboost.google/focuses/5570?locale=en&parent=catalog)
----------------------------------------
XRP/USD#bitfinex#1641740466459#3848397969954
market:delta @ 2022/01/09-15:01:06.459000
"459"
market:exchangeTime @ 2022/01/09-15:01:06.459000
"1641740466000"
market:market @ 2022/01/09-15:01:06.459000
"bitfinex"
market:orderType @ 2022/01/09-15:01:06.459000
"BID"
market:price @ 2022/01/09-15:01:06.459000
"0.74557"
market:volume @ 2022/01/09-15:01:06.459000
"50"
我使用的指南:
https://cloud.google.com/bigquery/external-data-bigtable#permanent-tables
https://cloud.google.com/bigquery/external-table-definition#tabledef-bigtable
(链接已更改)
一位同事同时找到了解决问题的方法。
他还找到了一个 youtube 视频,其中涵盖了我的大部分问题。
https://www.youtube.com/watch?v=tW4h6-cQz9s
我犯了 2 个基本错误:
- BigTable 和 BigQuery 不在同一台服务器上(美国和欧盟)
- table 定义文件必须通过控制台上传,而不是上传到存储桶中
他对 table 定义文件的解决方案如下所示:
- 'readRowKeyAsString'应该是真的
"sourceFormat": "BIGTABLE",
"sourceUris": [
"https://googleapis.com/bigtable/projects/..."
],
"bigtableOptions": {
"readRowkeyAsString": "true",
"columnFamilies": [
{
"familyId": "market",
"columns":[
{
"qualifierString": "market",
"type":"STRING"
},
{
"qualifierString": "exchangeTime",
"type":"STRING"
},
{
"qualifierString": "delta",
"type":"STRING"
},
{
"qualifierString": "orderType",
"type":"STRING"
},
{
"qualifierString": "volume",
"type":"STRING"
},
{
"qualifierString": "price",
"type":"STRING"
}
]
}
]
}
}
要在 BigQuery 中使用整个东西,需要一个数据视图
CREATE VIEW Table.Dataview AS
SELECT
rowkey,
market.delta as ts,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.delta.cell)), "") AS delta,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.market.cell)), "") AS market,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.exchangeTime.cell)), "") AS exchangeTime,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.orderType.cell)), "") AS orderType,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.price.cell)), "") AS price,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.volume.cell)), "") AS volume
FROM Table.Data
我尝试为 BigQuery 设置外部 tables,源是 BigTable。我可以创建 table 但在执行 sql 语句时出现此错误:
Error while reading data, error message: Error detected while parsing row starting at position: 6. Error: Data between close double quote (") and field separator.
很可能 table 定义文件是错误的。我花了最后一天修修补补,但我没有找到问题所在。有人可以指出我犯的错误吗?
谢谢
我的工作流程:
- 创建定义文件
- 使用命令:bq mk --external_table_definition=gs://realtimecrypto-ostabprj/def.json test.test1
- 在 bigquery 中:SELECT * FROM
ostabprj.test.test1
LIMIT 1000
我的table定义文件:
我也尝试了它的不同变体。
{
"sourceFormat": "BIGTABLE",
"sourceUris": [
"https://googleapis.com/bigtable/projects/ostabprj/instances/cryptorealtime/tables/cryptorealtime"
],
"bigtableOptions": {
"columnFamilies" : [
{
"familyId": "market",
"type": "STRING",
"encoding": "TEXT"
}
]
}
}
Bigtable 中的数据如下所示:
(来自本教程:https://www.cloudskillsboost.google/focuses/5570?locale=en&parent=catalog)
----------------------------------------
XRP/USD#bitfinex#1641740466459#3848397969954
market:delta @ 2022/01/09-15:01:06.459000
"459"
market:exchangeTime @ 2022/01/09-15:01:06.459000
"1641740466000"
market:market @ 2022/01/09-15:01:06.459000
"bitfinex"
market:orderType @ 2022/01/09-15:01:06.459000
"BID"
market:price @ 2022/01/09-15:01:06.459000
"0.74557"
market:volume @ 2022/01/09-15:01:06.459000
"50"
我使用的指南:
https://cloud.google.com/bigquery/external-data-bigtable#permanent-tables
https://cloud.google.com/bigquery/external-table-definition#tabledef-bigtable
(链接已更改)
一位同事同时找到了解决问题的方法。 他还找到了一个 youtube 视频,其中涵盖了我的大部分问题。
https://www.youtube.com/watch?v=tW4h6-cQz9s
我犯了 2 个基本错误:
- BigTable 和 BigQuery 不在同一台服务器上(美国和欧盟)
- table 定义文件必须通过控制台上传,而不是上传到存储桶中
他对 table 定义文件的解决方案如下所示:
- 'readRowKeyAsString'应该是真的
"sourceFormat": "BIGTABLE",
"sourceUris": [
"https://googleapis.com/bigtable/projects/..."
],
"bigtableOptions": {
"readRowkeyAsString": "true",
"columnFamilies": [
{
"familyId": "market",
"columns":[
{
"qualifierString": "market",
"type":"STRING"
},
{
"qualifierString": "exchangeTime",
"type":"STRING"
},
{
"qualifierString": "delta",
"type":"STRING"
},
{
"qualifierString": "orderType",
"type":"STRING"
},
{
"qualifierString": "volume",
"type":"STRING"
},
{
"qualifierString": "price",
"type":"STRING"
}
]
}
]
}
}
要在 BigQuery 中使用整个东西,需要一个数据视图
CREATE VIEW Table.Dataview AS
SELECT
rowkey,
market.delta as ts,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.delta.cell)), "") AS delta,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.market.cell)), "") AS market,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.exchangeTime.cell)), "") AS exchangeTime,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.orderType.cell)), "") AS orderType,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.price.cell)), "") AS price,
ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.volume.cell)), "") AS volume
FROM Table.Data