如何在不指定完整类型的情况下将行从 table 传递到 UDF?
How can I pass a row from my table to a UDF without specifying the complete type?
假设我想对 table(例如 sample Github commits) that has a nested structure using a JavaScript UDF)进行一些处理。我可能想在迭代它时更改在 UDF 中查看的字段实现,所以我决定只将 table 中的整行传递给它。我的 UDF 最终看起来像这样:
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
author STRUCT<name STRING, email STRING, ...>>)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
[UDF content here]
""";
然后我使用如下查询调用该函数:
SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;
UDF 声明中最麻烦的部分是输入结构,因为我必须包括所有嵌套字段及其类型。有更好的方法吗?
您可以使用 TO_JSON_STRING
将任意结构和数组转换为 JSON,然后在您的 UDF 中将其解析为对象以供进一步处理。例如,
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";
SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;
如果您想减少扫描的列数,您可以将相关列的结构传递给 TO_JSON_STRING
:
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";
SELECT
GetCommitStats(TO_JSON_STRING(
STRUCT(parent, author, difference)
)).*
FROM `bigquery-public-data.github_repos.sample_commits`;
假设我想对 table(例如 sample Github commits) that has a nested structure using a JavaScript UDF)进行一些处理。我可能想在迭代它时更改在 UDF 中查看的字段实现,所以我决定只将 table 中的整行传递给它。我的 UDF 最终看起来像这样:
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
author STRUCT<name STRING, email STRING, ...>>)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
[UDF content here]
""";
然后我使用如下查询调用该函数:
SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;
UDF 声明中最麻烦的部分是输入结构,因为我必须包括所有嵌套字段及其类型。有更好的方法吗?
您可以使用 TO_JSON_STRING
将任意结构和数组转换为 JSON,然后在您的 UDF 中将其解析为对象以供进一步处理。例如,
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";
SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;
如果您想减少扫描的列数,您可以将相关列的结构传递给 TO_JSON_STRING
:
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";
SELECT
GetCommitStats(TO_JSON_STRING(
STRUCT(parent, author, difference)
)).*
FROM `bigquery-public-data.github_repos.sample_commits`;