将组中行的 AVG 与不在组中的所有行进行比较 (BigQuery)
Compare AVG of rows in a group with all rows that are NOT in the group (BigQuery)
我有一个如下所示的数据集:
date
grp_name
uid
value_a
value_b
value_c
2022-01-01
A
1
1
10
5
2022-01-01
B
2
7
1
20
2022-01-01
C
10
7
3
20
2022-01-01
A
3
3
12
4
2022-01-02
B
2
6
1
21
2022-01-02
B
5
3
4
19
2022-01-03
A
6
1
15
6
2022-01-03
C
7
8
2
22
2022-01-03
D
9
10
2
18
对于每个日期和每个 grp_name,我想计算所有行的 value_a、value_b 和 value_c 的 AVG,并且(这里是我运行 成问题):不在组中的所有行的 value_a、value_b 和 value_c 的 AVG。
预计 grp_name = A,日期 = 2022-01-01。我想生成一个 in_grp 列来将来自手头组的平均值与来自非组成员的平均值分开。
date
grp_name
in_grp
value_a
value_b
value_c
2022-01-01
A
TRUE
2
11
4.5
2022-01-01
A
FALSE
7
2
20
这是我到目前为止编写的简单查询,它无法为平均值选择非组成员,也无法创建 in_grp 列来将组成员与非组成员分开-群成员:
SELECT
date,
grp_name,
AVG(value_a) value_a,
AVG(value_b) value_b,
AVG(value_c) value_c
FROM table
GROUP BY date, grp_name
关于如何解决这个问题有什么建议吗?
考虑以下方法
with temp as (
select distinct date, grp_name,
count(*) over() count_all,
count(*) over(partition by date, grp_name) count_in_grp,
sum(value_a) over() sum_a,
sum(value_a) over(partition by date, grp_name) sum_a_in_grp,
sum(value_b) over() sum_b,
sum(value_b) over(partition by date, grp_name) sum_b_in_grp,
sum(value_c) over() sum_c,
sum(value_c) over(partition by date, grp_name) sum_c_in_grp,
from your_table
)
select date, grp_name, true as in_grp,
sum_a_in_grp / count_in_grp as value_a,
sum_b_in_grp / count_in_grp as value_b,
sum_c_in_grp / count_in_grp as value_c
from temp
union all
select date, grp_name, false as in_grp,
(sum_a - sum_a_in_grp) / (count_all - count_in_grp) as value_a,
(sum_b - sum_b_in_grp) / (count_all - count_in_grp) as value_b,
(sum_c - sum_c_in_grp) / (count_all - count_in_grp) as value_c
from temp
-- order by date, grp_name, in_grp desc
如果应用于问题输出中的样本数据是
我有一个如下所示的数据集:
date | grp_name | uid | value_a | value_b | value_c |
---|---|---|---|---|---|
2022-01-01 | A | 1 | 1 | 10 | 5 |
2022-01-01 | B | 2 | 7 | 1 | 20 |
2022-01-01 | C | 10 | 7 | 3 | 20 |
2022-01-01 | A | 3 | 3 | 12 | 4 |
2022-01-02 | B | 2 | 6 | 1 | 21 |
2022-01-02 | B | 5 | 3 | 4 | 19 |
2022-01-03 | A | 6 | 1 | 15 | 6 |
2022-01-03 | C | 7 | 8 | 2 | 22 |
2022-01-03 | D | 9 | 10 | 2 | 18 |
对于每个日期和每个 grp_name,我想计算所有行的 value_a、value_b 和 value_c 的 AVG,并且(这里是我运行 成问题):不在组中的所有行的 value_a、value_b 和 value_c 的 AVG。
预计 grp_name = A,日期 = 2022-01-01。我想生成一个 in_grp 列来将来自手头组的平均值与来自非组成员的平均值分开。
date | grp_name | in_grp | value_a | value_b | value_c |
---|---|---|---|---|---|
2022-01-01 | A | TRUE | 2 | 11 | 4.5 |
2022-01-01 | A | FALSE | 7 | 2 | 20 |
这是我到目前为止编写的简单查询,它无法为平均值选择非组成员,也无法创建 in_grp 列来将组成员与非组成员分开-群成员:
SELECT
date,
grp_name,
AVG(value_a) value_a,
AVG(value_b) value_b,
AVG(value_c) value_c
FROM table
GROUP BY date, grp_name
关于如何解决这个问题有什么建议吗?
考虑以下方法
with temp as (
select distinct date, grp_name,
count(*) over() count_all,
count(*) over(partition by date, grp_name) count_in_grp,
sum(value_a) over() sum_a,
sum(value_a) over(partition by date, grp_name) sum_a_in_grp,
sum(value_b) over() sum_b,
sum(value_b) over(partition by date, grp_name) sum_b_in_grp,
sum(value_c) over() sum_c,
sum(value_c) over(partition by date, grp_name) sum_c_in_grp,
from your_table
)
select date, grp_name, true as in_grp,
sum_a_in_grp / count_in_grp as value_a,
sum_b_in_grp / count_in_grp as value_b,
sum_c_in_grp / count_in_grp as value_c
from temp
union all
select date, grp_name, false as in_grp,
(sum_a - sum_a_in_grp) / (count_all - count_in_grp) as value_a,
(sum_b - sum_b_in_grp) / (count_all - count_in_grp) as value_b,
(sum_c - sum_c_in_grp) / (count_all - count_in_grp) as value_c
from temp
-- order by date, grp_name, in_grp desc
如果应用于问题输出中的样本数据是