BigQuery 在解析时不识别时区

BigQuery does not recognise timezones when parsing

取自official doc:

Time zones are used when parsing timestamps or formatting timestamps for display. The timestamp value itself does not store a specific time zone, nor does it change when you apply a time zone offset.

Time zones are represented by strings in one of these two canonical formats:

  • Offset from Coordinated Universal Time (UTC), or the letter Z for UTC
  • Time zone name from the tz database

Example: 2014-09-27 12:30:00.45 America/Los_Angeles

这就是我想要做的:

timestamp = dateutil.parser.isoparse(log['timestamp'])
log['local_timestamp'] = timestamp.strftime("%Y-%m-%d %H:%M:%S") + ' Europe/Zurich'

然后,通过 ApacheBeam Python 数据流作业,将此条目写入 BigQuery 并产生此错误:

There were errors inserting to BigQuery. Will not retry. Errors were [{'index': 0, 'errors': [{'reason': 'invalid', 'location': 'local_timestamp', 'debugInfo': '', 'message': 'Unrecognized timezone: Europe/Zurich'}]}, {'index': 1, 'errors': [{'reason': 'invalid', 'location': 'local_timestamp', 'debugInfo': '', 'message': 'Unrecognized timezone: Europe/Zurich'}]}]

我尝试过不同的格式,例如在时间戳或不同位置的末尾附加 +2:00,甚至 America/Los_Angeles,如示例所示。它们都会导致无法识别的时区错误。似乎只有 UTC 有效。

是我做错了什么,还是文档不正确,只接受 UTC 时间戳?

谢谢!

不,BQ 接受 UTC 时区以外的时间。

SELECT
  CURRENT_TIMESTAMP() AS datetime_ymdhms,
  DATETIME(CURRENT_TIMESTAMP(),
    "Europe/Zurich") AS datetime_tstz;

我猜问题是“Europe/Zurich”。你能试试'Europe/Zurich'吗?

如果您在 BigQuery 中使用批量插入,您可能需要在加载之前检查在临时云存储路径中生成的文件。 有了这些信息,您可以检查它是数据流问题(很可能是您的代码生成的格式不好)还是 BigQuery 问题(出于任何奇怪的原因,TZ 未被接受)。

我使用了与数据流流式传输到 Bigquery 功能相关的基本 code sample from the documentation in order to check tabledata.insertAll 方法,模拟了时间戳转换的类似方法。

from google.cloud import bigquery
import dateutil, pytz
import dateutil.parser as dt

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of table to append to.
table_id = "your-project.your_dataset.your_table"

t1=dateutil.parser.isoparse('2021-10-20 10:37:24')
t2=str(t1.astimezone(pytz.timezone('Europe/Zurich')))

rows_to_insert = [
    {u"t1": t2}
]

errors = client.insert_rows_json(table_id, rows_to_insert)  # Make an API request.
if errors == []:
    print("New rows have been added.")
    print (t2)
else:
    print("Encountered errors while inserting rows: {}".format(errors))

它正确执行保留时区 Europe/Zurich

New rows have been added. 2021-10-20 10:37:24+02:00

只要 Bigquery UI 控制台以 UTC 格式保存时间戳,我的测试记录就会正确转换并插入到目标中 table:

希望这能澄清我在评论中所做的努力。