如何通过 SnowSQL 从存储在 Snowflake 中的复杂 json 中提取数据?

How to extract data from complex json stored in Snowflake via SnowSQL?

我在 Snowflake 的单个变体列 table 中存储了数百万个 JSON。它们采用以下格式,但每个 JSON 的行数不同。

有人能给我一些关于如何将数据提取到平面中的指导吗table?我刚开始使用 JSON 文件,行数不一致和缺少定义对象名称的指示符让我感到困惑。

这是一个示例 JSON:

{
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AB2 Weight on Bit": 0.2714572,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AB2 Weight on Bit unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD Diff Press Gain SP": 0,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD Diff Press Gain SP unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD ROP": 0,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD ROP unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Calculated Pipe Displacement": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Calculated Pipe Displacement unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Cumulative Delta Displacement": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Cumulative Delta Displacement unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.FD Svy Quality": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.FD Svy Quality unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.GWEX SampleFlow": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.GWEX SampleFlow unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.MP3_STK": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.MP3_STK unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.PT Correction": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.PT Correction unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Pit 11 Jumps": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Pit 11 Jumps unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.ROP - #1 Ref Time": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.ROP - #1 Ref Time unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK2_VOL": 8.732743,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK2_VOL unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK4_VOL": 16.13105,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK4_VOL unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Time On Slip": 1.3,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Time On Slip unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.WPDA - Mud Motor Torque": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.WPDA - Mud Motor Torque unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Washout Factor": 4.167005,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Washout Factor unit": "",
  "DeviceId": "streamingdevice",
  "EventEnqueuedUtcTime": "2020-05-04T22:12:21.5310000Z",
  "EventProcessedUtcTime": "2020-05-04T22:12:35.6868329Z",
  "IoTHub": {
    "ConnectionDeviceGenerationId": "637199801617320690",
    "ConnectionDeviceId": "streamingdevice",
    "CorrelationId": null,
    "EnqueuedTime": "2020-05-04T22:12:21.0000000",
    "MessageId": null,
    "StreamId": null
  },
  "PartitionId": 1,
  "Timestamp": "2019-10-30 13:48:05.000000"
}

"Edge 93 Belgium 43-23-19 1932"是对象名;每个 JSON 用于单个对象。

"Time_1_Avg.AB2 Weight on Bit"是阅读类型,本质上是由Tag1.Tag2组成的。

该行的最后一部分是读数。

JSON下方的时间戳为阅读时间

这部分不是必需的:

  "DeviceId": "streamingdevice",
  "EventEnqueuedUtcTime": "2020-05-04T22:12:21.5310000Z",
  "EventProcessedUtcTime": "2020-05-04T22:12:35.6868329Z",
  "IoTHub": {
    "ConnectionDeviceGenerationId": "637199801617320690",
    "ConnectionDeviceId": "streamingdevice",
    "CorrelationId": null,
    "EnqueuedTime": "2020-05-04T22:12:21.0000000",
    "MessageId": null,
    "StreamId": null
  },
  "PartitionId": 1,

此数据的理想输出为:

但是得到这样的东西真的很有帮助:

感谢您的帮助!

假设所需的键总是有 3 个以句点分隔的组件,以下是一种解决方案:

  • 使用 FLATTEN table 函数从 table 中获取任何 VARIANT 类型的列(示例中为 1 行常量)并将其分解为多行
  • 依赖于生成的 THIS 列(来自 FLATTEN table)为每个展开的行发出一个行常量值(Timestamp
  • 使用 NOT IN 过滤器排除不需要的键名
  • 使用带索引的SPLIT函数将提取的键分成多列
SELECT
  SPLIT(KEY, '.')[0] AS "Object Name"
, SPLIT(KEY, '.')[1] AS "Tag 1"
, SPLIT(KEY, '.')[2] AS "Tag 2"
, VALUE AS "Value"
, THIS:Timestamp::TIMESTAMP AS "Timestamp"
FROM TABLE(FLATTEN(PARSE_JSON('
{
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AB2 Weight on Bit": 0.2714572,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AB2 Weight on Bit unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD Diff Press Gain SP": 0,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD Diff Press Gain SP unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD ROP": 0,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD ROP unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Calculated Pipe Displacement": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Calculated Pipe Displacement unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Cumulative Delta Displacement": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Cumulative Delta Displacement unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.FD Svy Quality": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.FD Svy Quality unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.GWEX SampleFlow": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.GWEX SampleFlow unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.MP3_STK": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.MP3_STK unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.PT Correction": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.PT Correction unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Pit 11 Jumps": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Pit 11 Jumps unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.ROP - #1 Ref Time": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.ROP - #1 Ref Time unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK2_VOL": 8.732743,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK2_VOL unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK4_VOL": 16.13105,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK4_VOL unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Time On Slip": 1.3,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Time On Slip unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.WPDA - Mud Motor Torque": -999.25,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.WPDA - Mud Motor Torque unit": "",
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Washout Factor": 4.167005,
  "Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Washout Factor unit": "",
  "DeviceId": "streamingdevice",
  "EventEnqueuedUtcTime": "2020-05-04T22:12:21.5310000Z",
  "EventProcessedUtcTime": "2020-05-04T22:12:35.6868329Z",
  "IoTHub": {
    "ConnectionDeviceGenerationId": "637199801617320690",
    "ConnectionDeviceId": "streamingdevice",
    "CorrelationId": null,
    "EnqueuedTime": "2020-05-04T22:12:21.0000000",
    "MessageId": null,
    "StreamId": null
  },
  "PartitionId": 1,
  "Timestamp": "2019-10-30 13:48:05.000000"
}
')))
WHERE
  KEY NOT IN ('DeviceId', 'IoTHub', 'PartitionId', 'Timestamp', 'EventEnqueuedUtcTime', 'EventProcessedUtcTime');

这应该会产生类似于您的第一个屏幕截图的结果: