是否可以将整个 table(作为字符串)包含在 bigquery 的 sql 语句中?
Can an entire table (as a string) be included in a sql statement for bigquery?
有时我想在 BQ 控制台 中测试某些 BQ 函数和 sql 语句,而无需 在我的数据集中创建测试 table。例如,我可以使用以下命令在控制台中测试 regexp_match:
Select Regexp_extract(StringToParse,r'\b(à)\b') as Extract,
Regexp_match(StringToParse,r'\b(à)\b') as match,
FROM
(SELECT 'Voilà la séance qui est à Paris.' as StringToParse)
我想使用完整的 tables 做同样的事情,可能作为 json 字符串给出。
例如,如果我有一个包含两条记录的测试 table:
[
{"rowNumber":1,
"index": [1,2,3]
},
{"rowNumber":2,
"index": [2,7,8,15]
}
]
我可以把那个 table 交给 BQ sql 测试吗?类似于:
Select max(index) as max from parse('long json string')....
我知道没有给定模式,所以即时 table 可能是不可能的。
模式如下(好吧,在 'string' 中我有一个 'record' 用于整数数组,可能 - 这就是我想要测试的东西):
[
{
"name":"rowNumber",
"type":"integer"
},
{
"name": "index"
"type": "record" (oops, can't put an array of integers here...)
},
]
我很确定 BigQuery 支持这样的语法:
select *
from (select 'a' as cola, 1 as col1) a,
(select 'b', 2) b;
也就是说,您可以使用 select
和 union all
来定义 table,从而在查询中定义 "table"。
根据您的示例数据,您想要的输出架构是:
[
{
"name":"rowNumber",
"type":"integer"
},
{
"name": "index",
"type": "integer",
"mode": "repeated"
},
]
这里有一些对您的示例有用的东西,找到每个索引的 MAX
。不幸的是,最里面的 SELECT
中的 "SELECT NULL",但 BigQuery 抱怨在没有 FROM
子句的情况下使用 SPLIT
。
SELECT rowNumber, MAX(index) AS max_index FROM
(SELECT 1 AS rowNumber, INTEGER(SPLIT('1,2,3')) AS index FROM (SELECT NULL)),
(SELECT 2 AS rowNumber, INTEGER(SPLIT('2,7,8,15')) AS index FROM (SELECT NULL))
GROUP BY rowNumber
如果您正在寻找一种通常针对 JSON 执行此操作的方法,您可以使用 JSON functions in the query reference.
进行调查
我无法让您使用这些函数的确切示例,但根据您的 JSONPath-fu / JSON 结构,您可能能够得到一些工作。例如,这会获取第一行中的值。但是请注意,输出是字符串化的,因此您得到字符串“[1,2,3]”,但您可能可以使用一些字符串函数和 SPLIT
.
将其解析为正确的格式
SELECT
JSON_EXTRACT(input, '$[0].rowNumber') as rowNumber,
JSON_EXTRACT(input, '$[0].index') as index
FROM
(SELECT '[
{"rowNumber":1,
"index": [1,2,3]
},
{"rowNumber":2,
"index": [2,7,8,15]
}
]' as input);
注意:我是 answering/focusing 这个问题 - I sometimes want to test certain BQ functions and sql statements in the BQ console without creating a test table in my dataset
我看到的情况很少(可能会更多,但至少下面三个可以为您提供良好的开端)
Case #1 – Super Simple - no record type fields involved
示例:
SELECT a, b, c, d
FROM
(SELECT 1 AS a, 'x' AS b, 'Voilà la séance qui est à Paris.' AS c, '[{"rowNumber":1,"index": [1,2,3]},{"rowNumber":2,"index": [2,7,8,15]}]' AS d),
(SELECT 2 AS a, 'y' AS b, 'That session is in Paris.' AS c, '[{"rowNumber":3,"index": [4,5]},{"rowNumber":4,"index": [20, 23, 39]}]' AS d),
(SELECT 3 AS a, 'z' AS b, 'Эта сессия в Париже.' AS c, '[{"rowNumber":5,"index": [6,7,8,9]},{"rowNumber":6,"index": [15, 45]}]' AS d)
所以,现在您可以使用这个“虚拟”table 来试验您的代码,如下所示
SELECT
a, b,
REGEXP_EXTRACT(c, r'(à)') AS extract,
REGEXP_MATCH(c, r'(à)') AS match,
JSON_EXTRACT(d, '$[1].index[0]') AS index
FROM (
SELECT a, b, c, d
FROM
(SELECT 1 AS a, 'x' AS b, 'Voilà la séance qui est à Paris.' AS c, '[{"rowNumber":1,"index": [1,2,3]},{"rowNumber":2,"index": [2,7,8,15]}]' AS d),
(SELECT 2 AS a, 'y' AS b, 'That session is in Paris.' AS c, '[{"rowNumber":3,"index": [4,5]},{"rowNumber":4,"index": [20, 23, 39]}]' AS d),
(SELECT 3 AS a, 'z' AS b, 'Эта сессия в Париже.' AS c, '[{"rowNumber":5,"index": [6,7,8,9]},{"rowNumber":6,"index": [15, 45]}]' AS d)
)
Case #2 – Simple with Record
如果您的记录只有一个嵌套字段 – 下面是它
SELECT rowNumber, NEST(index) AS index
FROM
(SELECT 1 AS rowNumber, 1 AS index),
(SELECT 1 AS rowNumber, 2 AS index),
(SELECT 1 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 2 AS index),
(SELECT 2 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 8 AS index),
(SELECT 2 AS rowNumber, 15 AS index)
GROUP BY rowNumber
您可以在“简单”记录字段的实验中使用它作为替代品
顺便说一句,向自己确认这实际上是两行而不是 7 – 运行 下面:
SELECT COUNT(1) AS rows FROM (
SELECT rowNumber, NEST(index) AS index
FROM
(SELECT 1 AS rowNumber, 1 AS index),
(SELECT 1 AS rowNumber, 2 AS index),
(SELECT 1 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 2 AS index),
(SELECT 2 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 8 AS index),
(SELECT 2 AS rowNumber, 15 AS index)
GROUP BY rowNumber
)
Case #3 – Schema with Record of arbitrary complexity, like in example in your question
如果您想尝试任意模式,您应该先尝试一下如何使用 JS UDF 在 GBQ 中创建此类模式。
查看以下示例
掌握它后 – 您可以在 GBQ 中模仿任何 table 的任何复杂性,并将其用作子 select(而不是真正的 table)来试验 GBQ 功能
有时我想在 BQ 控制台 中测试某些 BQ 函数和 sql 语句,而无需 在我的数据集中创建测试 table。例如,我可以使用以下命令在控制台中测试 regexp_match:
Select Regexp_extract(StringToParse,r'\b(à)\b') as Extract,
Regexp_match(StringToParse,r'\b(à)\b') as match,
FROM
(SELECT 'Voilà la séance qui est à Paris.' as StringToParse)
我想使用完整的 tables 做同样的事情,可能作为 json 字符串给出。
例如,如果我有一个包含两条记录的测试 table:
[
{"rowNumber":1,
"index": [1,2,3]
},
{"rowNumber":2,
"index": [2,7,8,15]
}
]
我可以把那个 table 交给 BQ sql 测试吗?类似于:
Select max(index) as max from parse('long json string')....
我知道没有给定模式,所以即时 table 可能是不可能的。
模式如下(好吧,在 'string' 中我有一个 'record' 用于整数数组,可能 - 这就是我想要测试的东西):
[
{
"name":"rowNumber",
"type":"integer"
},
{
"name": "index"
"type": "record" (oops, can't put an array of integers here...)
},
]
我很确定 BigQuery 支持这样的语法:
select *
from (select 'a' as cola, 1 as col1) a,
(select 'b', 2) b;
也就是说,您可以使用 select
和 union all
来定义 table,从而在查询中定义 "table"。
根据您的示例数据,您想要的输出架构是:
[
{
"name":"rowNumber",
"type":"integer"
},
{
"name": "index",
"type": "integer",
"mode": "repeated"
},
]
这里有一些对您的示例有用的东西,找到每个索引的 MAX
。不幸的是,最里面的 SELECT
中的 "SELECT NULL",但 BigQuery 抱怨在没有 FROM
子句的情况下使用 SPLIT
。
SELECT rowNumber, MAX(index) AS max_index FROM
(SELECT 1 AS rowNumber, INTEGER(SPLIT('1,2,3')) AS index FROM (SELECT NULL)),
(SELECT 2 AS rowNumber, INTEGER(SPLIT('2,7,8,15')) AS index FROM (SELECT NULL))
GROUP BY rowNumber
如果您正在寻找一种通常针对 JSON 执行此操作的方法,您可以使用 JSON functions in the query reference.
进行调查我无法让您使用这些函数的确切示例,但根据您的 JSONPath-fu / JSON 结构,您可能能够得到一些工作。例如,这会获取第一行中的值。但是请注意,输出是字符串化的,因此您得到字符串“[1,2,3]”,但您可能可以使用一些字符串函数和 SPLIT
.
SELECT
JSON_EXTRACT(input, '$[0].rowNumber') as rowNumber,
JSON_EXTRACT(input, '$[0].index') as index
FROM
(SELECT '[
{"rowNumber":1,
"index": [1,2,3]
},
{"rowNumber":2,
"index": [2,7,8,15]
}
]' as input);
注意:我是 answering/focusing 这个问题 - I sometimes want to test certain BQ functions and sql statements in the BQ console without creating a test table in my dataset
我看到的情况很少(可能会更多,但至少下面三个可以为您提供良好的开端)
Case #1 – Super Simple - no record type fields involved
示例:
SELECT a, b, c, d
FROM
(SELECT 1 AS a, 'x' AS b, 'Voilà la séance qui est à Paris.' AS c, '[{"rowNumber":1,"index": [1,2,3]},{"rowNumber":2,"index": [2,7,8,15]}]' AS d),
(SELECT 2 AS a, 'y' AS b, 'That session is in Paris.' AS c, '[{"rowNumber":3,"index": [4,5]},{"rowNumber":4,"index": [20, 23, 39]}]' AS d),
(SELECT 3 AS a, 'z' AS b, 'Эта сессия в Париже.' AS c, '[{"rowNumber":5,"index": [6,7,8,9]},{"rowNumber":6,"index": [15, 45]}]' AS d)
所以,现在您可以使用这个“虚拟”table 来试验您的代码,如下所示
SELECT
a, b,
REGEXP_EXTRACT(c, r'(à)') AS extract,
REGEXP_MATCH(c, r'(à)') AS match,
JSON_EXTRACT(d, '$[1].index[0]') AS index
FROM (
SELECT a, b, c, d
FROM
(SELECT 1 AS a, 'x' AS b, 'Voilà la séance qui est à Paris.' AS c, '[{"rowNumber":1,"index": [1,2,3]},{"rowNumber":2,"index": [2,7,8,15]}]' AS d),
(SELECT 2 AS a, 'y' AS b, 'That session is in Paris.' AS c, '[{"rowNumber":3,"index": [4,5]},{"rowNumber":4,"index": [20, 23, 39]}]' AS d),
(SELECT 3 AS a, 'z' AS b, 'Эта сессия в Париже.' AS c, '[{"rowNumber":5,"index": [6,7,8,9]},{"rowNumber":6,"index": [15, 45]}]' AS d)
)
Case #2 – Simple with Record
如果您的记录只有一个嵌套字段 – 下面是它
SELECT rowNumber, NEST(index) AS index
FROM
(SELECT 1 AS rowNumber, 1 AS index),
(SELECT 1 AS rowNumber, 2 AS index),
(SELECT 1 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 2 AS index),
(SELECT 2 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 8 AS index),
(SELECT 2 AS rowNumber, 15 AS index)
GROUP BY rowNumber
您可以在“简单”记录字段的实验中使用它作为替代品 顺便说一句,向自己确认这实际上是两行而不是 7 – 运行 下面:
SELECT COUNT(1) AS rows FROM (
SELECT rowNumber, NEST(index) AS index
FROM
(SELECT 1 AS rowNumber, 1 AS index),
(SELECT 1 AS rowNumber, 2 AS index),
(SELECT 1 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 2 AS index),
(SELECT 2 AS rowNumber, 3 AS index),
(SELECT 2 AS rowNumber, 8 AS index),
(SELECT 2 AS rowNumber, 15 AS index)
GROUP BY rowNumber
)
Case #3 – Schema with Record of arbitrary complexity, like in example in your question
如果您想尝试任意模式,您应该先尝试一下如何使用 JS UDF 在 GBQ 中创建此类模式。 查看以下示例
掌握它后 – 您可以在 GBQ 中模仿任何 table 的任何复杂性,并将其用作子 select(而不是真正的 table)来试验 GBQ 功能