如何在不指定完整类型的情况下将行从 table 传递到 UDF?

How can I pass a row from my table to a UDF without specifying the complete type?

假设我想对 table(例如 sample Github commits) that has a nested structure using a JavaScript UDF)进行一些处理。我可能想在迭代它时更改在 UDF 中查看的字段实现,所以我决定只将 table 中的整行传递给它。我的 UDF 最终看起来像这样:

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
  input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
               author STRUCT<name STRING, email STRING, ...>>)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
[UDF content here]
""";

然后我使用如下查询调用该函数:

SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

UDF 声明中最麻烦的部分是输入结构,因为我必须包括所有嵌套字段及其类型。有更好的方法吗?

您可以使用 TO_JSON_STRING 将任意结构和数组转换为 JSON,然后在您的 UDF 中将其解析为对象以供进一步处理。例如,

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

如果您想减少扫描的列数,您可以将相关列的结构传递给 TO_JSON_STRING

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT
  GetCommitStats(TO_JSON_STRING(
    STRUCT(parent, author, difference)
  )).*
FROM `bigquery-public-data.github_repos.sample_commits`;