是否可以将整个 table(作为字符串)包含在 bigquery 的 sql 语句中?

Can an entire table (as a string) be included in a sql statement for bigquery?

有时我想在 BQ 控制台 中测试某些 BQ 函数和 sql 语句,而无需 在我的数据集中创建测试 table。例如,我可以使用以下命令在控制台中测试 regexp_match:

Select Regexp_extract(StringToParse,r'\b(à)\b') as Extract,
 Regexp_match(StringToParse,r'\b(à)\b') as match,
FROM
(SELECT 'Voilà la séance qui est à Paris.' as StringToParse)

我想使用完整的 tables 做同样的事情,可能作为 json 字符串给出。

例如,如果我有一个包含两条记录的测试 table:

[
   {"rowNumber":1,
    "index": [1,2,3]
   },
   {"rowNumber":2,
    "index": [2,7,8,15]
   }
]

我可以把那个 table 交给 BQ sql 测试吗?类似于:

Select max(index) as max from parse('long json string')....

我知道没有给定模式,所以即时 table 可能是不可能的。

模式如下(好吧,在 'string' 中我有一个 'record' 用于整数数组,可能 - 这就是我想要测试的东西):

[
  {
    "name":"rowNumber",
    "type":"integer"
  },
  {
    "name": "index"
    "type": "record" (oops, can't put an array of integers here...)
   },
]

我很确定 BigQuery 支持这样的语法:

select *
from (select 'a' as cola, 1 as col1) a,
     (select 'b', 2) b;

也就是说,您可以使用 selectunion all 来定义 table,从而在查询中定义 "table"。

根据您的示例数据,您想要的输出架构是:

[
  {
    "name":"rowNumber",
    "type":"integer"
  },
  {
    "name": "index",
    "type": "integer",
    "mode": "repeated"
  },
]

这里有一些对您的示例有用的东西,找到每个索引的 MAX。不幸的是,最里面的 SELECT 中的 "SELECT NULL",但 BigQuery 抱怨在没有 FROM 子句的情况下使用 SPLIT

SELECT rowNumber, MAX(index) AS max_index FROM
  (SELECT 1 AS rowNumber, INTEGER(SPLIT('1,2,3')) AS index FROM (SELECT NULL)),
  (SELECT 2 AS rowNumber, INTEGER(SPLIT('2,7,8,15')) AS index FROM (SELECT NULL))
GROUP BY rowNumber

如果您正在寻找一种通常针对 JSON 执行此操作的方法,您可以使用 JSON functions in the query reference.

进行调查

我无法让您使用这些函数的确切示例,但根据您的 JSONPath-fu / JSON 结构,您可能能够得到一些工作。例如,这会获取第一行中的值。但是请注意,输出是字符串化的,因此您得到字符串“[1,2,3]”,但您可能可以使用一些字符串函数和 SPLIT.

将其解析为正确的格式
SELECT 
  JSON_EXTRACT(input, '$[0].rowNumber') as rowNumber,
  JSON_EXTRACT(input, '$[0].index') as index
FROM
  (SELECT '[
   {"rowNumber":1,
    "index": [1,2,3]
   },
   {"rowNumber":2,
    "index": [2,7,8,15]
   }
]' as input);

注意:我是 answering/focusing 这个问题 - I sometimes want to test certain BQ functions and sql statements in the BQ console without creating a test table in my dataset

我看到的情况很少(可能会更多,但至少下面三个可以为您提供良好的开端)

Case #1 – Super Simple - no record type fields involved

示例:

SELECT a, b, c, d 
FROM 
  (SELECT 1 AS a, 'x' AS b, 'Voilà la séance qui est à Paris.' AS c, '[{"rowNumber":1,"index": [1,2,3]},{"rowNumber":2,"index": [2,7,8,15]}]' AS d),
  (SELECT 2 AS a, 'y' AS b, 'That session is in Paris.' AS c, '[{"rowNumber":3,"index": [4,5]},{"rowNumber":4,"index": [20, 23, 39]}]' AS d),
  (SELECT 3 AS a, 'z' AS b, 'Эта сессия в Париже.' AS c, '[{"rowNumber":5,"index": [6,7,8,9]},{"rowNumber":6,"index": [15, 45]}]' AS d)

所以,现在您可以使用这个“虚拟”table 来试验您的代码,如下所示

SELECT 
  a, b,
  REGEXP_EXTRACT(c, r'(à)') AS extract,
  REGEXP_MATCH(c, r'(à)') AS match,
  JSON_EXTRACT(d, '$[1].index[0]') AS index
FROM (
  SELECT a, b, c, d 
  FROM 
    (SELECT 1 AS a, 'x' AS b, 'Voilà la séance qui est à Paris.' AS c, '[{"rowNumber":1,"index": [1,2,3]},{"rowNumber":2,"index": [2,7,8,15]}]' AS d),
    (SELECT 2 AS a, 'y' AS b, 'That session is in Paris.' AS c, '[{"rowNumber":3,"index": [4,5]},{"rowNumber":4,"index": [20, 23, 39]}]' AS d),
    (SELECT 3 AS a, 'z' AS b, 'Эта сессия в Париже.' AS c, '[{"rowNumber":5,"index": [6,7,8,9]},{"rowNumber":6,"index": [15, 45]}]' AS d)
)

Case #2 – Simple with Record

如果您的记录只有一个嵌套字段 – 下面是它

SELECT rowNumber, NEST(index) AS index
FROM 
  (SELECT 1 AS rowNumber, 1 AS index),
  (SELECT 1 AS rowNumber, 2 AS index),
  (SELECT 1 AS rowNumber, 3 AS index),
  (SELECT 2 AS rowNumber, 2 AS index),
  (SELECT 2 AS rowNumber, 3 AS index),
  (SELECT 2 AS rowNumber, 8 AS index),
  (SELECT 2 AS rowNumber, 15 AS index)
GROUP BY rowNumber

您可以在“简单”记录字段的实验中使用它作为替代品 顺便说一句,向自己确认这实际上是两行而不是 7 – 运行 下面:

SELECT COUNT(1) AS rows FROM (
  SELECT rowNumber, NEST(index) AS index
  FROM 
    (SELECT 1 AS rowNumber, 1 AS index),
    (SELECT 1 AS rowNumber, 2 AS index),
    (SELECT 1 AS rowNumber, 3 AS index),
    (SELECT 2 AS rowNumber, 2 AS index),
    (SELECT 2 AS rowNumber, 3 AS index),
    (SELECT 2 AS rowNumber, 8 AS index),
    (SELECT 2 AS rowNumber, 15 AS index)
  GROUP BY rowNumber
)

Case #3 – Schema with Record of arbitrary complexity, like in example in your question

如果您想尝试任意模式,您应该先尝试一下如何使用 JS UDF 在 GBQ 中创建此类模式。 查看以下示例

掌握它后 – 您可以在 GBQ 中模仿任何 table 的任何复杂性,并将其用作子 select(而不是真正的 table)来试验 GBQ 功能