如何计算具有重复记录的 BigQuery table 中朋友的朋友数?
How to compute number of friends of friends in a BigQuery table with repeated records?
我有一个 BigQuery table,格式如下:
person
friends.name
friends.year
John
Mary
1977
Mike
1984
Mary
John
1980
Mike
John
1977
Jane
1971
我想计算每个 person
单独列中的最大年份,以及每个 friends
记录我想计算每个朋友的朋友数量有(可以通过自连接或 window 函数实现)。
我不确定如何编写此查询,到目前为止我的方法是:
SELECT person,
ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f),
ARRAY_LENGTH(friends) AS number_friends
FROM table
但是,这不会计算每个数组结构值的朋友数。这是我期待的输出:
person
friends.name
friends.year
friends.num_friends
max_year
John
Mary
1977
1
1984
Mike
1984
2
Mary
John
1980
2
1980
Mike
John
1977
2
1977
Jane
1971
0
如何以优化的方式编写此查询?
考虑以下方法
with friends_count as (
select person, ifnull(num_friends, 0) num_friends from (
select distinct name as person
from your_table, unnest(friends)
) left join (
select person, array_length(friends) num_friends
from your_table
) using(person)
)
select person, array(
select as struct name, year, ifnull(num_friends, 0) num_friends
from t.friends join friends_count on name = person
) friends,
(select max(year) from t.friends) max_year
from your_table t
如果应用于您问题中的示例数据 - 输出为
我有一个 BigQuery table,格式如下:
person | friends.name | friends.year |
---|---|---|
John | Mary | 1977 |
Mike | 1984 | |
Mary | John | 1980 |
Mike | John | 1977 |
Jane | 1971 |
我想计算每个 person
单独列中的最大年份,以及每个 friends
记录我想计算每个朋友的朋友数量有(可以通过自连接或 window 函数实现)。
我不确定如何编写此查询,到目前为止我的方法是:
SELECT person,
ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f),
ARRAY_LENGTH(friends) AS number_friends
FROM table
但是,这不会计算每个数组结构值的朋友数。这是我期待的输出:
person | friends.name | friends.year | friends.num_friends | max_year |
---|---|---|---|---|
John | Mary | 1977 | 1 | 1984 |
Mike | 1984 | 2 | ||
Mary | John | 1980 | 2 | 1980 |
Mike | John | 1977 | 2 | 1977 |
Jane | 1971 | 0 |
如何以优化的方式编写此查询?
考虑以下方法
with friends_count as (
select person, ifnull(num_friends, 0) num_friends from (
select distinct name as person
from your_table, unnest(friends)
) left join (
select person, array_length(friends) num_friends
from your_table
) using(person)
)
select person, array(
select as struct name, year, ifnull(num_friends, 0) num_friends
from t.friends join friends_count on name = person
) friends,
(select max(year) from t.friends) max_year
from your_table t
如果应用于您问题中的示例数据 - 输出为