Pubsub 到 BQ 数据流模板未解析 RECORD 类型数据

Pubsub to BQ Dataflow template is not parsing RECORD type data

我正在使用此数据流模板“Pub/Sub Topic to BigQuery”来解析具有 RECORD 类型数据结构的 json 模式。 示例:

{
"url":"/i?session_duration=61&app_key=123456&device_id=gdfttyty&sdk_name=javascript_native_web&sdk_version=18.04",
"body":
    {
    "session_duration":"61",
    "app_key":"eyrttyuyyu78jkjk",
    "device_id":"h1bh41yptik1vtwr8",
    "sdk_name":"javascript_native_web",
    "sdk_version":"18.04",
    "timestamp":"1597057884636",
    "hour":"10",
    "dow":"1"
    },
"app_key":"eyrttyuyyu78jkjk",
"timestamp":"1597057884636",
"ip_address":"0.0.0.0"
}

BigQuery 中定义的架构如下:

[

   {
      "name":"url",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {
      "name":"body",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[
         {
            "name":"session_duration",
            "type":"STRING",
            "mode":"NULLABLE"
         },
         {
            "name":"app_key",
            "type":"STRING",
            "mode":"NULLABLE"
         },
         {
            "name":"device_id",
            "type":"STRING",
            "mode":"NULLABLE"
         }, 
         {
            "name":"sdk_name",
            "type":"STRING",
            "mode":"NULLABLE"
         },
         {
            "name":"sdk_version",
            "type":"STRING",
            "mode":"NULLABLE"
         }, 
         {
            "name":"timestamp",
            "type":"TIMESTAMP",
            "mode":"NULLABLE"
         }, 
         {
            "name":"hour",
            "type":"TIME",
            "mode":"NULLABLE"
         },      
         {
            "name":"dow",
            "type":"STRING",
            "mode":"NULLABLE"
         }

      ]
   },
   {
      "name":"app_key",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {
      "name":"timestamp",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {
      "name":"ip_address",
      "type":"STRING",
      "mode":"NULLABLE"
   }
]

错误信息:

{"errors":[{"debugInfo":"","location":"","message":"Repeated record added outside of an array.","reason":"invalid"}],"index":0}

如果我在没有 RECORD 类型的情况下解析数据,它会在适当的 bigquery 中得到正确解析 table 但是如果有 RECORD 类型,它会被提取到生成的 bq table.

我成功地使用数据流 Pub/Sub to Bigquery 模板通过应用一些修改将您的样本插入到 BigQuery 中:

  • 我通过将重复字段放在方括号内将其包含在数组中 [...]
  • body.timestamp 值无效。您可以阅读 了解 BigQuery TIMESTAMP 数据类型和 UNIX 时间戳之间的区别。根据您要使用此时间戳执行的操作,您可以选择如何处理此问题。如果您不需要它进行分析,您可以轻松地将字段的数据类型更改为 INT64STRING,就像您对 [=33= 的 timestamp 列所做的那样].

所以消息应该是这样的:

{
"url":"/i?session_duration=61&app_key=123456&device_id=gdfttyty&sdk_name=javascript_native_web&sdk_version=18.04",
"body": [
    {
    "session_duration":"61",
    "app_key":"eyrttyuyyu78jkjk",
    "device_id":"h1bh41yptik1vtwr8",
    "sdk_name":"javascript_native_web",
    "sdk_version":"18.04",
    "timestamp":"1597057884636",
    "hour":"10",
    "dow":"1"
    }],
"app_key":"eyrttyuyyu78jkjk",
"timestamp":"1597057884636",
"ip_address":"0.0.0.0"
}

以及 body.timestamp 字段的数据类型已更改的架构,如下所示:

[

   ...
   
   ,
   {
      "name":"body",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[
         
         ...

         , 
         {
            "name":"timestamp",
            "type":"STRING",
            "mode":"NULLABLE"
         }, 
         
         ...
         
      ]
   },
   
   ...
   
]