如何计算具有重复记录的 BigQuery table 中朋友的朋友数?

How to compute number of friends of friends in a BigQuery table with repeated records?

我有一个 BigQuery table,格式如下:

person friends.name friends.year
John Mary 1977
Mike 1984
Mary John 1980
Mike John 1977
Jane 1971

我想计算每个 person 单独列中的最大年份,以及每个 friends 记录我想计算每个朋友的朋友数量有(可以通过自连接或 window 函数实现)。

我不确定如何编写此查询,到目前为止我的方法是:

SELECT person, 
   ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f), 
   ARRAY_LENGTH(friends) AS number_friends
FROM table

但是,这不会计算每个数组结构值的朋友数。这是我期待的输出:

person friends.name friends.year friends.num_friends max_year
John Mary 1977 1 1984
Mike 1984 2
Mary John 1980 2 1980
Mike John 1977 2 1977
Jane 1971 0

如何以优化的方式编写此查询?

考虑以下方法

with friends_count as (
  select person, ifnull(num_friends, 0) num_friends from (
  select distinct name as person
  from your_table, unnest(friends)
  ) left join (
    select person, array_length(friends) num_friends
    from your_table
  ) using(person)
)
select person, array( 
    select as struct name, year, ifnull(num_friends, 0) num_friends
    from t.friends join friends_count on name = person
  ) friends,
  (select max(year) from t.friends) max_year
from your_table t    

如果应用于您问题中的示例数据 - 输出为