在 BigQuery 中按 2 列分区
Partition By 2 Columns in BigQuery
假设我想对每个单位的前3个月收入和收入金额进行排名,我可以执行以下没问题:
Select * FROM (SELECT Flat,
EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER(PARTITION BY Flat ORDER BY SUM(USD_amt) DESC) AS rank,
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month)
WHERE rank<=3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month
ASC
结果
Flat Year Month Rank Revenue
1 2019 12 1 3281
1 2019 4 2 3031
1 2020 7 3 3019
1A 2020 11 1 1805
1A 2020 9 2 1178
1A 2020 5 3 1166
2 2020 1 1 3419
2 2020 7 2 2644
2 2021 1 3 2460
3 2019 10 3 2466
3 2020 6 1 2558
3 2020 1 2 2530
4 2020 7 1 0
但现在说我想要每个单位每年的前 3 个月,我认为我只需要按年修改分区如下:
RANK() OVER(PARTITION BY Flat, EXTRACT(YEAR FROM pay_day) ORDER BY SUM(USD_amt) DESC) AS rank
我希望结果是这样的:
Flat Year Month Rank Revenue
1 2019 12 1 3281
1 2019 4 2 3031
1 2019 1 3 2031
1 2020 4 1 3031
1 2020 9 2 3001
1 2020 7 3 2919
但这会导致错误“PARTITION BY expression references column pay_day which is need not grouped and aggregated at [4:50]”我想知道我做错了什么?
table架构如下:
Field name | Type | Mode
----------------------+---------+------------
Flat | STRING | NULLABLE
pay_day | DATE | NULLABLE
nights | INTEGER | NULLABLE
check_in | DATE | NULLABLE
check_out | DATE | NULLABLE
nights__in_month_ | STRING | NULLABLE
nights_outside_month_ | STRING | NULLABLE
cleaning | INTEGER | NULLABLE
currency | STRING | NULLABLE
USD_amt | INTEGER | NULLABLE
EANR | STRING | NULLABLE
name | STRING | NULLABLE
people | INTEGER | NULLABLE
country | STRING | NULLABLE
reservation_no_ | STRING | NULLABLE
payment_processor | STRING | NULLABLE
Check_in_day | STRING | NULLABLE
Cleaner | STRING | NULLABLE
Review | STRING | NULLABLE
我想你可以在 BigQuery 中引用列别名:
SELECT *
FROM (SELECT Flat, EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER (PARTITION BY Flat, Year ORDER BY SUM(USD_amt) DESC) AS rank,
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month
) ym
WHERE rank <= 3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month ASC
如果没有,您可以使用:
RANK() OVER (PARTITION BY Flat, EXTRACT(YEAR FROM MIN(pay_day))
ORDER BY SUM(USD_amt) DESC
) AS rank,
GROUP BY
范围内的最小日期具有相同的年份,因此这实际上是相同的。我意识到这就是我通常处理这个问题的方式。
假设我想对每个单位的前3个月收入和收入金额进行排名,我可以执行以下没问题:
Select * FROM (SELECT Flat,
EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER(PARTITION BY Flat ORDER BY SUM(USD_amt) DESC) AS rank,
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month)
WHERE rank<=3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month
ASC
结果
Flat Year Month Rank Revenue
1 2019 12 1 3281
1 2019 4 2 3031
1 2020 7 3 3019
1A 2020 11 1 1805
1A 2020 9 2 1178
1A 2020 5 3 1166
2 2020 1 1 3419
2 2020 7 2 2644
2 2021 1 3 2460
3 2019 10 3 2466
3 2020 6 1 2558
3 2020 1 2 2530
4 2020 7 1 0
但现在说我想要每个单位每年的前 3 个月,我认为我只需要按年修改分区如下:
RANK() OVER(PARTITION BY Flat, EXTRACT(YEAR FROM pay_day) ORDER BY SUM(USD_amt) DESC) AS rank
我希望结果是这样的:
Flat Year Month Rank Revenue
1 2019 12 1 3281
1 2019 4 2 3031
1 2019 1 3 2031
1 2020 4 1 3031
1 2020 9 2 3001
1 2020 7 3 2919
但这会导致错误“PARTITION BY expression references column pay_day which is need not grouped and aggregated at [4:50]”我想知道我做错了什么?
table架构如下:
Field name | Type | Mode
----------------------+---------+------------
Flat | STRING | NULLABLE
pay_day | DATE | NULLABLE
nights | INTEGER | NULLABLE
check_in | DATE | NULLABLE
check_out | DATE | NULLABLE
nights__in_month_ | STRING | NULLABLE
nights_outside_month_ | STRING | NULLABLE
cleaning | INTEGER | NULLABLE
currency | STRING | NULLABLE
USD_amt | INTEGER | NULLABLE
EANR | STRING | NULLABLE
name | STRING | NULLABLE
people | INTEGER | NULLABLE
country | STRING | NULLABLE
reservation_no_ | STRING | NULLABLE
payment_processor | STRING | NULLABLE
Check_in_day | STRING | NULLABLE
Cleaner | STRING | NULLABLE
Review | STRING | NULLABLE
我想你可以在 BigQuery 中引用列别名:
SELECT *
FROM (SELECT Flat, EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER (PARTITION BY Flat, Year ORDER BY SUM(USD_amt) DESC) AS rank,
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month
) ym
WHERE rank <= 3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month ASC
如果没有,您可以使用:
RANK() OVER (PARTITION BY Flat, EXTRACT(YEAR FROM MIN(pay_day))
ORDER BY SUM(USD_amt) DESC
) AS rank,
GROUP BY
范围内的最小日期具有相同的年份,因此这实际上是相同的。我意识到这就是我通常处理这个问题的方式。