BigQuery:加载日期列数据类型为 long 转换为时间戳的 avro 文件
BigQuery: Load avro-files with date column data type as long converted to timestamp
我无法让 BigQuery 从 avro 文件正确加载时间戳。
avro 文件的日期列存储时间很长,逻辑类型为 timestamp-micros。根据文档,BigQuery 应将其存储为时间戳数据类型。我也尝试过 timestamp-millis 作为逻辑类型。
数据是这样存储在avro中的:
{'id': '<masked>', '<masked>': '<masked>', 'tm': 1553990400000, '<masked>': <masked>, '<masked>': <masked>, 'created': 1597056958864}
字段tm和created是多头,2019-03-31T00:00:00Z 和 2020-08-10T11:50:58.986816592Z。
avro 的架构是
{"type":"record","name":"SomeMessage","namespace":"com.df",
"fields":
[{"name":"id","type":"string"},
{"name":"<masked>","type":"string"},
{"name":"tm","type":"long","logicalType":"timestamp-micros"},
{"name":"<masked>","type":"int"},
{"name":"<masked>","type":"float"},
{"name":"created","type":"long","logicalType":"timestamp-micros"}]}";
当通过 bq load 导入到 BigQuery 时,记录最终如下所示:
<masked> <masked> tm <masked> <masked> created
________________________________________________________________________________________________________
<masked> | <masked> | 1970-01-18 23:39:50.400 UTC | <masked> | <masked> | 1970-01-19 11:37:36.958864 UTC
________________________________________________________________________________________________________
使用的导入命令是:
bq load --source_format=AVRO --use_avro_logical_types some_dataset.some_table "gs://some-bucket/some.avro"
BigQuery 中的时间戳与 avro 中提供的实际值相去甚远。
有人知道如何正确执行此操作吗?
我发现 avro 模式实际上是错误的。
时间戳字段应该是这样的:
{"name":"created","type":{"type":"long", "logicalType":"timestamp-millis"}}
我无法让 BigQuery 从 avro 文件正确加载时间戳。
avro 文件的日期列存储时间很长,逻辑类型为 timestamp-micros。根据文档,BigQuery 应将其存储为时间戳数据类型。我也尝试过 timestamp-millis 作为逻辑类型。
数据是这样存储在avro中的:
{'id': '<masked>', '<masked>': '<masked>', 'tm': 1553990400000, '<masked>': <masked>, '<masked>': <masked>, 'created': 1597056958864}
字段tm和created是多头,2019-03-31T00:00:00Z 和 2020-08-10T11:50:58.986816592Z。
avro 的架构是
{"type":"record","name":"SomeMessage","namespace":"com.df",
"fields":
[{"name":"id","type":"string"},
{"name":"<masked>","type":"string"},
{"name":"tm","type":"long","logicalType":"timestamp-micros"},
{"name":"<masked>","type":"int"},
{"name":"<masked>","type":"float"},
{"name":"created","type":"long","logicalType":"timestamp-micros"}]}";
当通过 bq load 导入到 BigQuery 时,记录最终如下所示:
<masked> <masked> tm <masked> <masked> created
________________________________________________________________________________________________________
<masked> | <masked> | 1970-01-18 23:39:50.400 UTC | <masked> | <masked> | 1970-01-19 11:37:36.958864 UTC
________________________________________________________________________________________________________
使用的导入命令是:
bq load --source_format=AVRO --use_avro_logical_types some_dataset.some_table "gs://some-bucket/some.avro"
BigQuery 中的时间戳与 avro 中提供的实际值相去甚远。
有人知道如何正确执行此操作吗?
我发现 avro 模式实际上是错误的。 时间戳字段应该是这样的:
{"name":"created","type":{"type":"long", "logicalType":"timestamp-millis"}}