json_extract_scalar 失败，键名中有括号

Question

我有一个字符串字段，其中包含从服务器发送的原始 JSON 数据。但是，密钥包含括号，这似乎在尝试提取其中的数据时导致问题。

数据样本：

{"Interview (Onsite)": "2015-04-06 16:58:28"}

提取尝试：

timestamp(max(json_extract_scalar(a.status_history, '$.Interview (Onsite)')))

（使用'max'函数，因为status_history是重复字段）

错误：

JSONPath parse error at: (Onsite)

我尝试了多种转义括号的常用方法，但无济于事。

非常感谢有关如何规避的建议 - 我宁愿不求助于正则表达式，除非我真的必须这样做。

Answer 1

启用 standard SQL 后（取消选中 UI 中 "Show Options" 下的 "Use Legacy SQL"），您可以使用带引号的字符串作为 JSON 路径的一部分。例如：

SELECT
  CAST(JSON_EXTRACT_SCALAR(
    '{"Interview (Onsite)": "2015-04-06 16:58:28"}',
    "$['Interview (Onsite)']") AS TIMESTAMP) AS t;

编辑：由于您有一个 ARRAY<STRING> 列，您将需要使用 ARRAY 子查询将 JSON_EXTRACT_SCALAR 应用于每个元素。例如：

WITH T AS (
  SELECT
    ['{"Interview (Onsite)": "2015-04-06 16:58:28"}',
     '{"Interview (Onsite)": "2015-11-16 08:09:10"}',
     '{"Interview (Onsite)": "2016-01-01 18:12:43"}']
     AS status_history UNION ALL
  SELECT
    ['{"Interview (Onsite)": "2016-06-25 07:01:45"}']
)
SELECT
  ARRAY (
    SELECT CAST(JSON_EXTRACT_SCALAR(history, "$['Interview (Onsite)']") AS TIMESTAMP)
    FROM UNNEST(status_history) AS history
  ) AS interview_times
FROM T;

或者，如果您不关心保留数组的结构，您可以 "flatten" 它有一个连接，这将 return [=16= 的每个元素一行]:

WITH T AS (
  SELECT
    ['{"Interview (Onsite)": "2015-04-06 16:58:28"}',
     '{"Interview (Onsite)": "2015-11-16 08:09:10"}',
     '{"Interview (Onsite)": "2016-01-01 18:12:43"}']
     AS status_history UNION ALL
  SELECT
    ['{"Interview (Onsite)": "2016-06-25 07:01:45"}']
)
SELECT
  CAST(JSON_EXTRACT_SCALAR(history, "$['Interview (Onsite)']") AS TIMESTAMP)
    AS interview_time
FROM T CROSS JOIN UNNEST(status_history) AS history;

另见 section of the migration guide on handling of repeated fields。

json_extract_scalar 失败，键名中有括号

json_extract_scalar fails with parenthesis in key name

jsonpath

google-bigquery